Open CodeSammich opened 3 years ago
I found that compiling libpytorch_jni.so
with -fuse-ld=lld
linking option makes it work
Tested with pytorch-1.10.2, android-ndk-r22b, cross compile arm64.
Edit file android/pytorch_android/CMakeLists.txt
,
add one line target_link_libraries(${PYTORCH_JNI_TARGET} -fuse-ld=lld)
,
follow https://github.com/pytorch/pytorch/tree/master/android and run scripts/build_pytorch_android.sh
extract shared libraries from aar file, and it works
I've built myself a PyTorch 1.13 AAR (from 1.13 git tag) with scripts/build_pytorch_android.sh
(no arguments provided) using NDK 21e as recommended in building instructions for Android and got this error.
I see there are fixes suggested, but some should probably be merged or added to the building instructions.
I encounter the same error when using the official builds of 1.12.2 for Android (the latest available at the moment), it's both in pytorch_android
and pytorch_android_lite
Manually building a PyTorch AAR with NDK 23+ that uses LLD by default instead of NDK 21.x as recommended solves the issue. For this one has to do the following:
common
directory from KhronosGroup/Vulkan-Tools, git tag v1.2.161 (the version is important to build PyTorch successfully, this one is the latest that worked for me) into $ANDROID_NDK/sources/third_party/vulkan/src
.The above should be enough to get a working build, but I personally upgraded PyTorch's Android Gradle Plugin to 7.3.1
(since 7.3.x
bundles NDK 23.1.7779620 by default) instead of specifying the NDK version directly just in case there is some incompatibility between the newer NDK and the older AGP. This requires some additional tweaks though:
6.8.3
with 7.4
in distributionUrl
in android/gradle/wrapper/gradle-wrapper.properties
classpath 'com.android.library:com.android.library.gradle.plugin:7.3.1'
into allprojects.buildscript.dependencies
in android/build.gradle
maven
plugin with maven-publish
(this probably requires some additional changes to make the publishing work properly, but the usual build succeeds anyway): replace apply plugin: 'maven'
with apply plugin: 'maven-publish'
in android/pytorch_android/build.gradle
and android/pytorch_android_torchvision/build.gradle
@TimPushkin Thanks for your effort, you saved me a lot of pain, apparently nobody cares that official builds (and snapshots) are broken.
The issue persists in the 1.13 release published on Maven
@malfet I am sorry for the direct tag, but is there any chance this will be fixed in the upcoming releases? LibTorch cannot be linked against on arm64-v8a ABI with LLD which is the default linker on Android for two years now. Both official releases and instructions for building from source provide unlinkable results.
The fact that issue remains, means no one bother to do simplest integration testing. How's hard to compile properly just for once? One must build manually or do some voodoo to have this working. This is ridiculous.
@milo1000 if you have a PR to fix the problem, please do not hesitate to propose one. @agunapal you've worked on 1.13 Android release. Have you encountered this issue? @TimPushkin no worries about the ping, let me try to find some time to look into the problem. Just to clarify: does this happen to maven packages, or during the source build?
@malfet Thanks! This happens both to the maven packages and to AAR I build from source with build_pytorch_android
script.
@malfet I went over procedure described by @TimPushkin in comment https://github.com/pytorch/pytorch/issues/51020#issuecomment-1336405310 Don't get me wrong, I cherish your effort, but such bugs makes your whole great job pointless.
@milo1000 I have built and published the libraries for 1.13. I did not see this error. Let me try running this on 1.13.1 and get back to you
@agunapal To be clear, it seems like you need to use a sufficiently recent NDK where LLD is the default linker in the test app. According to its release notes, you need NDK r22b or later, I personally get these errors on NDK r23 (which is the default on current Android Gradle Plugin versions), and I believe I also tested r25 (which is the latest):
ld: error: found local symbol '__bss_end__' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '__bss_start' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '_end' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '_edata' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '__bss_start__' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '_bss_end__' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
ld: error: found local symbol '__end__' in global part of symbol table in file _deps/torch-src/jni/arm64-v8a/libpytorch_jni_lite.so
Also, there seems to be no 1.13.1 release published on the official Maven, so I've tested this on 1.12.2 and 1.13.0 official builds from there.
I am writing to request your assistance in resolving an error I am encountering while using D2Go on Android. I have my project submission next week and this is my final year project. so I need your assistance badly to run my custom model on Android.
I have used PyTorch version 1.13.0 for training, and I am now attempting to use D2Go on Android. However, I have encountered an error, and I am not sure how to resolve it. I came across your GitHub profile and noticed that you have experience working with D2Go, so I am hoping that you can provide me with guidance on how to fix this error.
The error message I am receiving is
FATAL EXCEPTION: main Process: org.pytorch.demo.objectdetection, PID: 31163 java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/app/org.pytorch.demo.objectdetection-5BnM-_V8v6oKj-1tFHt8xQ==/base.apk"],nativeLibraryDirectories=[/data/app/org.pytorch.demo.objectdetection-5BnM-_V8v6oKj-1tFHt8xQ==/lib/arm64, /data/app/org.pytorch.demo.objectdetection-5BnM-_V8v6oKj-1tFHt8xQ==/base.apk!/lib/arm64-v8a, /system/lib64]]] couldn't find "libpytorch_jni.so
in build.gradle i have update version as well!
implementation 'org.pytorch:pytorch_android_lite:1.13.0' implementation 'org.pytorch:pytorch_android_torchvision_lite:1.13.0' implementation 'org.pytorch:torchvision_ops:0.14.0'
@dipu0 I'm not sure, but I think the error in your message is raised because libpytorch_jni.so
is loaded which does not exist in org.pytorch:pytorch_android_lite
package, it is libpytorch_jni_lite.so
@TimPushkin Can you please try with android ndk r19c
@dipu0 Seems like the .aar file didn't get uploaded for some reason. Can you please use the previous version of PyTorch.
@agunapal
@dipu0 Seems like the .aar file didn't get uploaded for some reason. Can you please use the previous version of PyTorch.
i tried to train with older torch version but it gives error and also tried to use olde version in android build.Gradle that did not work as well.
@agunapal Yes, my app compiles with NDK r19c, and just for the record, it also does with r21e. I figure, this is because these NDKs don't use LLD by default. But I would prefer not to be stuck with 2+ year old NDKs.
@agunapal Is there any news regarding this?
@TimPushkin i published 1.13.1 android binaries few days ago.
@agunapal Just tried them and I get the same linker errors
@TimPushkin do you mean you are not able to build from source or do the binaries not work?
I tried the 1.13.1 AAR published on Maven
@TimPushkin could you please paste the error you are seeing .
@agunapal Sure, it is the same as I posted above: comment
Same error for me, the binaries should be recompiled with -fvisibility=hidden
compiler switch, but then you have to manually choose which functions you want to be visible :(
Error is ld: error: found local symbol '__end__' in global part of symbol table in file
but only with NDK higher than 21.4, everything I tested from 22.x failed
RG
@agunapal Any updates on the issue?
Any news on this issue?
For me, when there are 21.4.7075529/
and 22.1.7171670/
in folder android-sdk/ndk/
, the "com.android.tools.build:gradle:3.5.3" will use 22.1.7171670
to compile and meet such error
ld: error: found local symbol '__end__' in global part of symbol table in file ../../../../build/pytorch_android_lite-1.12.2.aar/jni/arm64-v8a/libpytorch_jni_lite.so
After delete android-sdk/ndk/22.1.7171670/
, the "com.android.tools.build:gradle:3.5.3" will use 21.4.7075529/
and everything is fine.
For me, when there are
21.4.7075529/
and22.1.7171670/
in folderandroid-sdk/ndk/
, the "com.android.tools.build:gradle:3.5.3" will use22.1.7171670
to compile and meet such errorld: error: found local symbol '__end__' in global part of symbol table in file ../../../../build/pytorch_android_lite-1.12.2.aar/jni/arm64-v8a/libpytorch_jni_lite.so
After delete
android-sdk/ndk/22.1.7171670/
, the "com.android.tools.build:gradle:3.5.3" will use21.4.7075529/
and everything is fine.
nice bro, I meet the same error, and then use ndkVersion "21.4.7075529" will be ok.
Any updates?
❓ Questions and Help
I'm currently adapting the Native Android C++ with Custom Ops guide to run my own model on React Native for Android on the nightly
PyTorch 1.8.0
build.When I run something like
npx react-native run-android
to compile, it compiles successfully for the first kind of CPU architecture (e.g.armeabi-v7a
):but then it fails for the second architecture (e.g.
arm64-v8a
.It seems like certain compiled variables inside
libpytorch_jni.so
is being added to runtime multiple times, causing the conflict. Does that mean I have to compile only for one architecture at a time, or am I missing something entirely? Since the native build extraction extracts all 4 at the same time, I figure they must be able to compile separately just fine.Thank you!
EDIT: It seems it's just
arm64-v8a
inherently that has this problem, not necessarily due to conflicts. I've cleaned my Android build files and filtered the NDK to just compile forarm64-v8a
and it still fails on the same error. All other architectures build properly.cc @malfet @seemethere @walterddr