softcite / software-mentions

Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
Apache License 2.0
68 stars 11 forks source link

server crashes with "qemu: uncaught target signal 6" on Mac M1 silicon #29

Open jameshowison opened 1 year ago

jameshowison commented 1 year ago

Using the 0.8.0-SNAPSHOT image, but running on a Macbook and Docker, I think the image uses x86 emulation. There is a bug (perhaps unfixed) in how qemu relates to this emulation.

That resulted in the server dumping core with the message "qemu: uncaught target signal 6".

I worked to build the image for arm64 (aka mac M1 or M2 silicon) using the edit to the Dockerfile in #28 and using --platform=arm64 and that seems to fix the issue (after specifying --no-cache on the docker build. e.g.,

docker build --platform=linux/arm64 -t grobid/software-mentions:0.8.0-SNAPSHOT-arm64 --build-arg GROBID_VERSION=0.8.0-SNAPSHOT-arm64 --file Dockerfile.software .

Perhaps images could be built as multi platform with --platform=linux/amd64,linux/arm64 as described here:

https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/

(but see below, that seemed to work briefly but now fails, which is very odd).

jameshowison commented 1 year ago

Well, I'm completely stumped. 10 minutes ago this image (built with --platform=linux/arm64) was avoiding this error that says qemu: uncaught target signal 6. I was seeing calls to the annotate_tei method and the counter for processed files from the client was going up (slowly); that said, no software.json files were written, so it wasn't perfect.

Now, same image, same data, is immediate throwing that error.

I've tried everything I can think of, can't get this to run on M1 Mac (via Docker, I also tried to get the server to run directly, using the compile instructions) but none of that worked. I guess I'll try via AWS?

jameshowison commented 1 year ago

Working on the mac m1 build for Docker is really difficult when you don’t have the hardware. I don’t exactly know how to go about it, perhaps some other grobid users have it working? It’s definitely something to do with qemu, but I’m not sure if the versions are pinned or if they could be updated?

jameshowison commented 1 year ago

I played with this a little more. A few notes for others that might come this way:

  1. Following the hint at the end of https://github.com/tensorflow/tensorflow/issues/52845#issuecomment-1272337911 I backed out the stage_1 base image to FROM python:3.8-slim then did a RUN pip install tensorflow tensorflow-io. Building with docker build --platform=linux/arm64/v8 -t grobid/software-mentions:0.8.0-SNAPSHOT-aarch64 --build-arg GROBID_VERSION=0.8.0-SNAPSHOT-aarch64 --file Dockerfile.software . that completes fine. And the builder stage seems to build as arm64 fine (btw, I have no idea of the difference, if any, between aarch64 and arm64)
  2. Then one runs into a problem with DeLFT 0.3.3 that shows up from a dependency that is trying to build tensorflow-gpu (which is deprecated, apparently they are the same but they've stopped building a package with that name?). Also showed up as a problem installing tensorflow==2.9.3
  3. I forked DeLFT at https://github.com/jameshowison/delft and added git to the install so I could mess with the dependencies and eventually figured out that the setup.py was the place to change some of the pinned version numbers. I changed tensorflow to >=2.9.3 but had to remove tensorflow-addon entirely (it is now deprecated, so I don't think there are built versions available for aarch64?)
  4. With those dependency changes then the pip install line works.
  5. Unfortunately I then run into trouble with the # install jep (and temporarily the matching JDK) step. That seems to be directly grabbing a jdk and I'm guessing a x86 one because the error is:
    > [stage-1 14/34] RUN /tmp/jdk-17/bin/javac -version:
    0.456 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such file or directory

    Ah, I see that it's actually in the name openjdk-17.0.2_linux-x64_bin.tar.gz

Does the jep install need to be done this way, could more standard package installs help?

jameshowison commented 1 year ago

It is possible that the jep install will work with:

RUN JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))" pip3 install jep==4.0.2   

based on reading: https://stackoverflow.com/questions/43655291/dynamically-set-java-home-of-docker-container

That does seem to work for me in the Dockerfile.software

jameshowison commented 1 year ago

So, maybe making progress. I can get the docker build to finish, and it does create an aarch64 image.

Unfortunately, it hits this error:

screenit-softcite-server_software_mentions-1  | INFO  [2023-08-01 19:34:43,591] com.hubspot.dropwizard.guicier.DropwizardModule: Added guice injected health check: org.grobid.service.controller.HealthCheck
screenit-softcite-server_software_mentions-1  | com.google.inject.CreationException: Unable to create injector, see the following errors:
screenit-softcite-server_software_mentions-1  | 
screenit-softcite-server_software_mentions-1  | 1) Error injecting constructor, java.lang.UnsatisfiedLinkError: /opt/grobid/grobid-home/lib/lin-64/libwapiti.so: /opt/grobid/grobid-home/lib/lin-64/libwapiti.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64 .so on a AARCH64 platform)
screenit-softcite-server_software_mentions-1  |   at org.grobid.service.GrobidEngineInitialiser.<init>(GrobidEngineInitialiser.java:29)

So lin-64/libwapiti.so is something built earlier in the chain.

jameshowison commented 6 months ago

@kermitt2 Hi Patrice, I'm trying to return to this, any idea if there was progress here? I'll check over on grobid/grobid as well.

kermitt2 commented 6 months ago

Hi James, I am calling for help @lfoppiano for the issue because he concentrates all the experience for running Grobid on mac and on the challenge of building a arm64 docker image.

Afaik aarch64 and arm64 are the same.

lfoppiano commented 6 months ago

Hi @jameshowison, this subject is still open, since we still don't have (yet) a CI that build images for ARM yet. But might be an opportunity to have it soon 😄

First, could you try to run this image on your Mac and let me know if it works (try to process a few files)?

https://hub.docker.com/layers/lfoppiano/grobid/0.8.0-arm/images/sha256-79b85da73bae5c2a483e381c1e1231bc73dc0d6b987f16b867a3eb6e8154d7b8?context=explore

Given that software-mention is based on Grobid I think the simplest is to build its docker image from the grobid one, unless I'm overlooking something 😅