Open gemarcano opened 7 years ago
Thank you for the report. It is in my plans to start working on AArch64 in the following days.
Thanks for the fast reply.
Just to follow up, I also did forget to mention that I did have to change some cudnn functions in order to get the project to compile with cudnn v5, but it wasn't that difficult. Most of the functions with issues have a backwards compatible version, most of the time simply ending in _v3.
I did manage to get the project to compile just now. I practically butchered the Makefile, effectively telling it to treat the aarch* case as one that also uses the system OpenBLAS libraries, and setting up the flags for AArch64 compilation. I wouldn't suggest to anyone to do the kind of hack I applied to the Makefile-- there has to be s a saner way to structure those changes (perhaps by having different cases for arm/aarch32 and aarch64).
For reference, even though I am not a fan of what I did, here is the diff.
I have not yet checked to see if the resulting program/library works properly. If I find any other issues and/or pitfalls while testing on AArch64, I'll report them.
Thank you very much for your help, it's much appreciated.
I've pushed a new commit that has integrated OpenBLAS for aarch64. I've tested it on the Tegra TX1 with CUDNNv5. Thanks to your diff, it saved me precious time.
There are a couple of problems that prevent it from compiling for AArch64, but they pretty much all revolve around the ARM assembly found in
thvector.h
and inOpenBLAS-stripped/arm
. ARM isn't compatible with AArch64 assembly:aarch*
, it defines__NEON__
, which enables assembly optimizations inthvector.h
. As I mentioned before, this assembly is not compatible with aarch64.OpenBLAS-stripped/arm/*.S
assembly files are not compatible with AArch64.-mfpu
and-mfp16-format
flags. See https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html for the current list latest GCC supports. Ubuntu Xenial GCC 5.4 does not even support all of the options mentioned in that list (namely, +fp16 is not listed in the GCC 5.4 documentation for the same page).I ran into these issues while trying to compile this project for the Nvidia TX1, which now bundles a GCC compiling for AArch64, running Ubuntu Xenial LTS.
If I can get this project to compile, I'll try to explain what it was I had to do to achieve it. Currently, with modifications to the Makefile, I can compile most of the project but it is getting hung up on the OpenBLAS-stripped part. I'm trying to see if I can get it to compile with the system provided OpenBLAS library.