maccallumlab / meld

Modeling with limited data
http://meldmd.org
Other
54 stars 28 forks source link

Meld 0.5.0: ModuleNotFoundError: No module named 'meld.system.meld_system' #113

Closed ca-taylor closed 2 years ago

ca-taylor commented 2 years ago

After hitting a problem with the "master" branch of MELD (Issue #112), I decided to move forward with the 0.5.0 release/tag. However, I'm now encountering a Python "module not found" error referencing "meld.system.meld_system". I do not see an explicit reference to any such module nor can I find the module in the source code?

What am I missing?

Traceback (most recent call last): File "/apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.0/bin/launch_remd", line 29, in main() File "/apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.0/bin/launch_remd", line 25, in main launch.launch(console, args.debug, args.console_log) File "/apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.0/lib64/python3.8/site-packages/meld/remd/launch.py", line 100, in launch system = store.load_system() File "/apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.0/lib64/python3.8/site-packages/meld/vault.py", line 612, in load_system return _load_pickle(system_file) File "/apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.0/lib64/python3.8/site-packages/meld/vault.py", line 23, in _load_pickle return pickle.load(data) ModuleNotFoundError: No module named 'meld.system.meld_system'

jlmaccal commented 2 years ago

If you are trying to help with Liwei, I wouldn't bother with 0.5.0, as it is missing features that he will need.

I think you probably setup the system with a newer version of MELD, but are trying to run with an older version. A large number of modules changed and are not backwards compatible.

ca-taylor commented 2 years ago

Ah, that might be the case since I was using system borrowed from Arup for testing and he is definitely using the later code.

Thank you.

ca-taylor commented 2 years ago

Sorry to keep pestering but... working with the most recent code in the master branch (which I've dubbed 0.5.9 for want of something better), I'm seeing... ... File "/apps/python-core/3.8/lib/python3.8/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /apps/cuda/11.0.207/gcc/9.3.0/openmpi/4.0.4/meld/0.5.9/lib64/python3.8/site-packages/_meldplugin.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN10MeldPlugin9MeldForce20modifyGroupNumActiveEii

However, that symbol is defined in libMeldPlugin.so which is in my LD_LIBRARY_PATH and should be found. I'm not too familiar with the C/Python interface/linkage so I'm not sure what is going wrong here. I've rebuilt twice and reproduced the error both times. I can't see anything wrong with the build or the setup but clearly something is missing.

[chasman@login4 lib]$ nm libMeldPlugin.so | grep modifyGroupNumActive 0000000000037ba6 T _ZN10MeldPlugin9MeldForce20modifyGroupNumActiveEii

jlmaccal commented 2 years ago

I haven't seen that one before...

Maybe try running ldd on _meldplugin.cpython-38-x86_64-linux-gnu.so?

ca-taylor commented 2 years ago

Good thought but "ldd" output looks as expected. I'll have to think about it some more.

ca-taylor commented 2 years ago

FWIW, I was able to get around the undefined symbol by using LD_PRELOAD with libMeldPlugin.so. I don't think that should be necessary but it gets us running so I'll take it.

        export LD_PRELOAD=$HPC_MELD_LIB/libMeldPlugin.so