hjkgrp / molSimplify

molSimplify code
GNU General Public License v3.0
171 stars 49 forks source link

List of dependencies when not using conda #121

Closed samfux84 closed 1 year ago

samfux84 commented 1 year ago

Hi,

On HPC clusters of universities, it is rather uncommon to use Conda, as it creates a large number of small files (several hundred thousand files with an average size of KB), which is kind of a worst case scenario for many high-performance file systems as for instance Lustre.

Therefore I am trying to install molSimplify without using conda on the Euler cluster of ETH Zurich, but already getting a list of dependencies is not trivial. When looking at

https://github.com/hjkgrp/molSimplify/tree/master/devtools/conda-envs

there are two .yml files. The one for m1 lists more than 160 dependencies, but not XTB or PSI4, whereas the other one only lists 18 dependencies including XTB and psi4. And the README.md again lists a different set of dependencies, not containing XTB and psi4.

Would it be possible that you provide a minimal list of dependencies (not RPM packages, not Conda packages) for molSimplify, that allows to build the software from source?

Thank you very much in advance and best regards

Sam

ralf-meyer commented 1 year ago

Hi Sam,

The "minimal" set of dependencies crucially depends on what your intended use case for molSimplify. I usually run an python 3.8 environment with the following packages:

All of these heavily rely on interfaces to C and Fortran code which is why a conda installation is the simplest way to handle possible library conflicts. You should, however, be able to install all of these (maybe with the exception of openbabel) from pypi.

Coming back the question of your intended use case: Do you actually need to install molSimplify on an HPC cluster? The most common use cases probably only involve pre- and post-processing of quantum chemistry calculations which you should be able to do on your local machine.

Ralf

samfux84 commented 1 year ago

@ralf-meyer : Thank you for your reply and for providing the list with the minimal set of dependencies. This is exactly what I was looking for.

I have installed all those dependencies from source and could install molSimplify (I did it already before Christmas, just guessing which dependencies would be required). I can start the program without getting any error message:

[sfux@eu-login-26 ~]$ module load gcc/6.3.0 molsimplify/1.7.1

The following have been reloaded with a version change:
  1) gcc/4.8.5 => gcc/6.3.0

[sfux@eu-login-26 ~]$ module list

Currently Loaded Modules:
  1) StdEnv   2) gcc/6.3.0   3) openblas/0.2.20   4) python/3.8.5   5) openbabel/2.4.1   6) libffi/3.2.1   7) zlib/1.2.9   8) molsimplify/1.7.1

[sfux@eu-login-26 ~]$ molsimplify --help
TensorFlow connection successful
usage: molsimplify [-h] [-core CORE] [-oxstate OXSTATE] [-coord] [-geometry] [-geo] [-lig LIG] [-ligocc] [-spin SPIN] [-spinmultiplicity SPINMULTIPLICITY] [-multiplicity MULTIPLICITY] [-keepHs KEEPHS]
                   [-skipANN SKIPANN] [-rundir] [-smicat] [-ligloc LIGLOC] [-ff FF] [-ffoption FFOPTION] [-ff_final_opt FF_FINAL_OPT]

Welcome to molSimplify. Only basic usage is described here.
For help on advanced modules, please refer to our documentation at molsimplify.mit.edu or provide additional commands to -h, as below:
-h advanced: advanced structure generation help
-h slabgen: slab builder help
-h autocorr: automated correlation analysis help
-h db: database search help
-h inputgen: quantum chemistry code input file generation help
-h postproc: post-processing help
-h random: random generation help
-h binding: binding species (second molecule) generation help
-h customcore: custom core functionalization help
-h tsgen: transition state generation help
-h naming: custom filename help
-h liganddict: ligands.dict help

optional arguments:
  -h, --help            show this help message and exit
  -core CORE            core structure with currently available: ag au cadmium cd chromium chromiumporphyrin co cobalt cobaltporphyrin copper copperporphyrin cr cr4nh3o cu fe fe2o2 fen4py fen4pyo fenh3o
                        ferrcore ferrocene gold hafnium hf hg ir iridium iron ironporphyrin la lanthanum manganese manganeseporphyrin mercury mn mn3nh3o mn5nh3o mo molybdenum n2ots nb ni nickel nickelporphyrin
                        niobium os osmium palladium pd platinum pt re rh rhenium rhodium ru ruthenium sc scandium scandiumporphyrin silver ta tantalum tc technetium ti titanium titaniumrphyrin tungsten v
                        vanadium vanadiumporphyrin w y yttrium zinc zincporphyrin zirconium zn zncat zr
  -oxstate OXSTATE      oxidation state of the metal
  -coord                coordination such as 4,5,6
  -geometry             geometry
  -geo                  geometry
  -lig LIG              ligands to be included in complex; ligands.dict options display with command `molsimplify -h liganddict`
  -ligocc               number of corresponding ligands
  -spin SPIN            Spin multiplicity (e.g., 1 for singlet)
  -spinmultiplicity SPINMULTIPLICITY
                        Spin multiplicity (e.g., 1 for singlet)
  -multiplicity MULTIPLICITY
                        Spin multiplicity (e.g., 1 for singlet)
  -keepHs KEEPHS        force keep hydrogens, default auto for each ligand
  -skipANN SKIPANN      skip attempting ANN predictions
  -rundir               directory for jobs, default ~/Runs
  -smicat               connecting atoms corresponding to smiles. Indexing starts at 1 which is the default value as well. Use [] for multiple SMILES ligands, e.g. [1],[2]
  -ligloc LIGLOC        force location of ligands in the structure generation (default False)
  -ff FF                select force field for FF optimization. Available: (default) MMFF94, UFF, GAFF, Ghemical, XTB, GFNFF
  -ffoption FFOPTION    select when to perform FF optimization. Options: B(Before),A(After), (default) BA, N(No)
  -ff_final_opt FF_FINAL_OPT
                        optionally select different force field for final FF optimization after structure generation (defaults to option used in -ff). Available: MMFF94, UFF, GAFF, Ghemical, XTB, GFNFF
[sfux@eu-login-26 ~]$

The user that requested the software is now testing the installation.

Coming back the question of your intended use case: Do you actually need to install molSimplify on an HPC cluster? The most common use cases probably only involve pre- and post-processing of quantum chemistry calculations which you should be able to do on your local machine.

I am doing the installation on request from one of our cluster users from the chemistry department of our university. I agree with you that users could also install the software locally, but if all required dependencies are available on the cluster too, then why not install the software there as well.

In the past years, we see a shift from using an HPC cluster for pure HPC computing (large MPI jobs using hundreds or even thousands of cores) to all kind of scientific computing, including single core jobs, interactive jobs and visualization that users could also run on their local computer.

Again thank you for providing the list of dependencies.

Best regards

Sam