snurr-group / mofid

A system for rapid identification and analysis of metal-organic frameworks
https://snurr-group.github.io/mofid/
GNU General Public License v2.0
46 stars 24 forks source link

Issue running `pip install .` for mofid in Google Colab #30

Open ngkayjay opened 1 year ago

ngkayjay commented 1 year ago

Hey, I got an error that I can't debug when running the pip install. Make init and path setup went without issues.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Processing /content/gdrive/MyDrive/Project_MTF-C/mofid
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I was running this on colab. All required setup packages are updated as follows. Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (68.0.0) Requirement already satisfied: ez_setup in /usr/local/lib/python3.10/dist-packages (0.9)

bbucior commented 1 year ago

Thanks for reporting and trying it out on another platform!

I'm not too familiar with Google Colab but may have found a workaround. The source of the crash appears to be parsing the install_requires option in setup.py, which sets up a dependency in older Python 2.x configurations. Everything seemed to work for me after commenting out that line. (or adding a step like sed -i -e 's/install_requires/#install_requires/' setup.py to the install process).

Does it fix the error for you, too?

ngkayjay commented 1 year ago

Yes it does, the fix works! Thanks! Right now I'm on Google Colab as my HPC resource allocation has been approved, but not yet implemented. I suspect other users who would want to play around with ML on MOFs without institutional resources would appreciate your advice as well.

For other users on Colab, be advised to run !chmod -R 755 <YOUR_DIR> in Colab after you run pip install . to set proper privileges, otherwise you'd get a Errno 13 Permissions error.

One more question: how long does it take to construct a mofid for a given .cif file on your end? The authors whose work I'm reproducing had constructed the mofids for a dataset of 400k+ .cifs, but it takes me ~6s to construct a single mofid. I'm wondering where I should start my optimization.

bbucior commented 1 year ago

Awesome, glad everything's working now!

For the ML training set, unfortunately calculating the MOFids is going to take awhile for a large folder of CIFs. Your calculation times are consistent with what I'm seeing on my laptop (make test runs through 28 CIFs in 1-2 minutes). If memory serves correctly, I ran MOF databases by splitting the CIFs into a few folders and ran them as parallel jobs on HPC resources (see Scripts/HPC/).

TBH, while you're waiting on HPC resources, your best bet to get started would probably be a precomputed MOFid.smi or similar structural information, if it's available in the SI of that paper or another compatible one. For example, our SmVAE paper includes an training set with RFcodes, so slightly different from MOFid but a similar intent. Maybe something like that could help get things off the ground until you get the compute resources for reproducing the original 400k+ dataset?