pybamm-team / PyBaMM

Fast and flexible physics-based battery models in Python
https://www.pybamm.org/
BSD 3-Clause "New" or "Revised" License
1.06k stars 526 forks source link

[Bug]: IDAKLU solver does not work when running in an aarch64 Docker container (unresolved symbols from CasADi) #3879

Open agriyakhetarpal opened 6 months ago

agriyakhetarpal commented 6 months ago

PyBaMM Version

develop

Python Version

3.11

Describe the bug

See https://github.com/pybamm-team/PyBaMM/pull/3874#issuecomment-1986939495 for more

I came across this when testing the most recent Docker image for PyBaMM on Docker Hub.

Steps to Reproduce

On an arm64 (M-series) macOS machine with Docker:

  1. Pull the Docker image with docker run -it pybamm/pybamm:idaklu
  2. docker run -it pybamm/pybamm:idaklu
  3. python -c "import pybamm; pybamm.IDAKLUSolver()

displays the following:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/pybamm/PyBaMM/pybamm/solvers/idaklu_solver.py", line 118, in __init__
    raise ImportError("KLU is not installed")
ImportError: KLU is not installed

which upon further debugging reveals (see logs below)

Relevant log output

ImportError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 idaklu = importlib.util.module_from_spec(idaklu_spec)

File <frozen importlib._bootstrap>:573, in module_from_spec(spec)

File <frozen importlib._bootstrap_external>:1233, in create_module(self, spec)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

ImportError: /home/pybamm/PyBaMM/pybamm/solvers/idaklu.cpython-311-aarch64-linux-gnu.so: undefined symbol: _ZN6casadi8Function11deserializeERKSs
agriyakhetarpal commented 6 months ago

Labelled this as a medium-priority issue because we don't have wheels for aarch64 Linux right now anyway (#3462), so users who are on such architectures are probably building PyBaMM from source already (if they wish to use the IDAKLU solver, that is).

agriyakhetarpal commented 6 months ago

@arjxn-py, is there a reason why we are using continuumio/miniconda3:latest, i.e., the latest-tagged image? We should have ideally pinned this to a particular tag to make the build reproducible – I believe that the aarch64 image was working earlier when we were merging things (I had tested it), so I am not sure when exactly this bug has appeared and how...

agriyakhetarpal commented 6 months ago

3874 shall have to remain stalled until this is resolved (and we should look into this before #3666 and #3692)

agriyakhetarpal commented 6 months ago

We did have CMake pinned (cmake==3.22) in #3223 when we pushed the images initially, but this was unpinned later – I am looking at this locally to see if reverting to that works

Edit: no luck so far with that

arjxn-py commented 6 months ago

is there a reason why we are using continuumio/miniconda3:latest, i.e., the latest-tagged image?

No such reason to use the latest tag but, I guess i haven't anticipated that this might cause an issue later on. What we can do now is try pinning to the tags 4~10 months old (there are 3-4 tags I can see). As I am not sure if i can reproduce this aarch64 error locally (maybe lack of architecture), so I'd let you know with the updated branches on my fork to test them.

agriyakhetarpal commented 6 months ago

As I am not sure if i can reproduce this aarch64 error locally (maybe lack of architecture), so I'd let you know with the updated branches on my fork to test them.

Thanks, actually I did test the last three tags for the miniconda image by building it and running the container locally, and also pinned cmake==3.22 – it did not work and returned the same error. Maybe we'll need to just use an older gcc or something by grabbing it off conda, rather than through apt, and hopefully that can fix it – we can't use a version that is too old, however.

agriyakhetarpal commented 5 months ago

Possible method of resolution, only an idea for now: build CasADi from source in the images (for both architectures). We don't need to compile interfaces to the many solvers and frameworks available, just the Python/SWIG bindings, so the build should take ~2 minutes – a fine compromise.

Linux source installation instructions: https://github.com/casadi/casadi/wiki/InstallationLinux are quite actively documented and updated

santacodes commented 5 months ago

Possible method of resolution, only an idea for now: build CasADi from source in the images (for both architectures). We don't need to compile interfaces to the many solvers and frameworks available, just the Python/SWIG bindings, so the build should take ~2 minutes – a fine compromise.

Linux source installation instructions: https://github.com/casadi/casadi/wiki/InstallationLinux are quite actively documented and updated

Might as well try that. If it's not already been worked on by someone, I'd love to try this PoC.

agriyakhetarpal commented 5 months ago

Sure, we would love the help, @santacodes!