mosdef-hub / foyer

A package for atom-typing as well as applying and disseminating forcefields
https://foyer.mosdef.org
MIT License
119 stars 78 forks source link

Foyer installation details. Docker file is up to date? #578

Closed iGulitch closed 2 months ago

iGulitch commented 2 months ago

Hello!

Is the Docker file in the repo up to date or is this one that shall be used?

Do I understand it correct that the installation of foyer is done via the following two commands :

mamba env create --file environment-dev.yml &&
conda activate foyer-dev &&

and not via git clone of this github repository and the following pip command? So, the installation works simply because foyer is in the dependencies list of mbuild and gmso that are in turn in the dependencies list of foyer and all the three packages are installed from conda-forge channel, correct?

P.S. Check the dependencies file environment-dev.yml of foyer. There is gmso written twice.

chrisjonesBSU commented 2 months ago

I'll have to take a closer look at the Docker file to answer that question.

But, there are multiple ways to install foyer.

  1. Install from anaconda
mamba create -n foyer -c conda-forge foyer 

This will make a new conda environment and install the latest release of foyer along with all the dependencies needed to use it. For most use cases, this will be all you need to do. But also, as you mention, if you install gmso from anaconda you'll get foyer as well.

Or you can install foyer into an existing conda environment you're already using

mamba activate your-environment
mamba install -c conda-forge foyer
  1. Install from source

Installing from source is helpful if you need to use some functionality that has been merged in, but not yet released, or if you want to modify foyer's code yourself and run it. In this case you'll use the .yml file in this repo, then use pip to install from either this cloned repo, or your fork.

mamba env create -f environment-dev.yml
mamba activate foyer-dev
pip install -e .

Technically, creating the environment from the .yml will also install foyer, since gmso is listed as a dependency, but typically, the idea of using these .yml files is to then install from source. Otherwise, you might as well just install directly from anaconda.

iGulitch commented 2 months ago

Dear @chrisjonesBSU , thanks for your reply and for the installation instructions recall.

The Dockerfile that I got from here ( otherwise, look for the latest tag here ), looks approximately like the following :

ADD file:f278386b0cef68136129f5f58c52445590a417b624d62bca158d4dc926c340df in /

CMD ["/bin/sh"]

LABEL maintainer=Vlad Frolov

LABEL src=https://github.com/frol/docker-alpine-glibc

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8

SHELL [/bin/ash -eo pipefail -c]

RUN /bin/ash -eo pipefail -c ALPINE_GLIBC_BASE_URL="https://github.com/sgerrand/alpine-pkg-glibc/releases/download" &&
  ALPINE_GLIBC_PACKAGE_VERSION="2.33-r0" &&
  ALPINE_GLIBC_BASE_PACKAGE_FILENAME="glibc-$ALPINE_GLIBC_PACKAGE_VERSION.apk" &&
  ALPINE_GLIBC_BIN_PACKAGE_FILENAME="glibc-bin-$ALPINE_GLIBC_PACKAGE_VERSION.apk" &&
  ALPINE_GLIBC_I18N_PACKAGE_FILENAME="glibc-i18n-$ALPINE_GLIBC_PACKAGE_VERSION.apk" &&
  apk add -q --no-cache --virtual=.build-dependencies wget ca-certificates &&
  echo "-----BEGIN PUBLIC KEY-----        MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEApZ2u1KJKUu/fW4A25y9m        y70AGEa/J3Wi5ibNVGNn1gT1r0VfgeWd0pUybS4UmcHdiNzxJPgoWQhV2SSW1JYu        tOqKZF5QSN6X937PTUpNBjUvLtTQ1ve1fp39uf/lEXPpFpOPL88LKnDBgbh7wkCp        m2KzLVGChf83MS0ShL6G9EQIAUxLm99VpgRjwqTQ/KfzGtpke1wqws4au0Ab4qPY        KXvMLSPLUp7cfulWvhmZSegr5AdhNw5KNizPqCJT8ZrGvgHypXyiFvvAH5YRtSsc        Zvo9GI2e2MaZyo9/lvb+LbLEJZKEQckqRj4P26gmASrZEPStwc+yqy1ShHLA0j6m        1QIDAQAB        -----END PUBLIC KEY-----" | sed 's/   */\n/g' > "/etc/apk/keys/sgerrand.rsa.pub" &&
  wget -q "$ALPINE_GLIBC_BASE_URL/$ALPINE_GLIBC_PACKAGE_VERSION/$ALPINE_GLIBC_BASE_PACKAGE_FILENAME" "$ALPINE_GLIBC_BASE_URL/$ALPINE_GLIBC_PACKAGE_VERSION/$ALPINE_GLIBC_BIN_PACKAGE_FILENAME" "$ALPINE_GLIBC_BASE_URL/$ALPINE_GLIBC_PACKAGE_VERSION/$ALPINE_GLIBC_I18N_PACKAGE_FILENAME" &&
  apk add -q --no-cache "$ALPINE_GLIBC_BASE_PACKAGE_FILENAME" "$ALPINE_GLIBC_BIN_PACKAGE_FILENAME" "$ALPINE_GLIBC_I18N_PACKAGE_FILENAME" &&
  rm "/etc/apk/keys/sgerrand.rsa.pub" &&
  /usr/glibc-compat/bin/localedef --force --inputfile POSIX --charmap UTF-8 "$LANG" || true &&
  echo "export LANG=$LANG" > /etc/profile.d/locale.sh &&
  apk del -q glibc-i18n &&
  rm "/root/.wget-hsts" &&
  apk del -q .build-dependencies &&
  rm "$ALPINE_GLIBC_BASE_PACKAGE_FILENAME" "$ALPINE_GLIBC_BIN_PACKAGE_FILENAME" "$ALPINE_GLIBC_I18N_PACKAGE_FILENAME" # buildkit

LABEL maintainer=Anaconda, Inc

ENV PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

ARG CONDA_VERSION=py39_4.10.3

ARG SHA256SUM=1ea2f885b4dbc3098662845560bc64271eb17085387a70c2ba3f29fff6f8d52f

RUN |2 CONDA_VERSION=py39_4.10.3 SHA256SUM=1ea2f885b4dbc3098662845560bc64271eb17085387a70c2ba3f29fff6f8d52f /bin/ash -eo pipefail -c apk add -q --no-cache bash procps &&
  wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh &&
  echo "${SHA256SUM}  miniconda.sh" > miniconda.sha256 &&
  if ! sha256sum -cs miniconda.sha256; then exit 1; fi &&
  mkdir -p /opt &&
  sh miniconda.sh -b -p /opt/conda &&
  rm miniconda.sh miniconda.sha256 &&
  ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh &&
  echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc &&
  echo "conda activate base" >> ~/.bashrc &&
  find /opt/conda/ -follow -type f -name '*.a' -delete &&
  find /opt/conda/ -follow -type f -name '*.js.map' -delete &&
  /opt/conda/bin/conda clean -afy # buildkit

CMD ["/bin/bash"]

EXPOSE map[8888/tcp:{}]

LABEL maintainer.name=mosdef-hub maintainer.url=https://mosdef.org

ENV PATH=/opt/micromamba/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

USER root

ADD . /foyer # buildkit

WORKDIR /foyer

RUN /bin/ash -eo pipefail -c addgroup -S anaconda && adduser -S anaconda -G anaconda # buildkit

RUN /bin/ash -eo pipefail -c apk update && apk add libarchive &&
  conda update conda -yq &&
  conda config --set always_yes yes --set changeps1 no &&
  . /opt/conda/etc/profile.d/conda.sh &&
  sed -i -E "s/python.*$/python="$(PY_VERSION)"/" environment.yml &&
  conda install -c conda-forge mamba &&
  mamba env create --file environment-dev.yml &&
  conda activate foyer-dev &&
  mamba install -c conda-forge jupyter python="$PY_VERSION" &&
  python setup.py install &&
  echo "source activate foyer-dev" >> /home/anaconda/.profile &&
  conda clean -afy &&
  mkdir -p /home/anaconda/data &&
  chown -R anaconda:anaconda /foyer &&
  chown -R anaconda:anaconda /opt &&
  chown -R anaconda:anaconda /home/anaconda # buildkit

WORKDIR /home/anaconda

COPY devtools/docker-entrypoint.sh /entrypoint.sh # buildkit

RUN /bin/ash -eo pipefail -c chmod a+x /entrypoint.sh # buildkit

USER anaconda

ENTRYPOINT ["/entrypoint.sh"]

CMD ["jupyter"]

It's clearly different from the one in this repo and I think it's confusing whether this installation follows p. 1 or 2 of your installation instructions above. For me, it looks like a mixture of both with some necessary commands missing, e.g. I see no git clone that copies foyer repo.

Perhaps, you may want to clarify the connection between foyer, mbuild and gmso, i.e. what depends on what and what requires what? Do I understand it correct that foyer requires mbuild, whereas foyer itself is required by gmso?

iGulitch commented 2 months ago

@chrisjonesBSU , forgot to add the following last time.

I'm creating Singularity container with the whole MoSDeF infrastructure [ based on my own base conda image ] and thus I'm following your installation variant 2. Consequently, in my Singularity definition file [ these files are equivalent to the Docker files ] , I have the following instructions :

export APP="mosdef"
git clone https://github.com/mosdef-hub/foyer.git
. /opt/conda/etc/profile.d/conda.sh
conda env create -f ./environment_${APP}.yml
conda activate ${APP}
pip install -e foyer

that are literally similar to yours. The environment_${APP}.yml file is basically the compilation of the development versions of the environmetal files of foyer, mbuild, gmso :

name: mosdef
channels:
  - conda-forge
  - omnia
dependencies:
  - boltons
  - bump2version
  - codecov
  - ele >= 0.2.0
  - forcefield-utilities >= 0.2.1
  - gmso
  - importlib_resources
  - lark
  - lxml
  - mbuild >= 0.17
  - networkx >= 2.5
  - numpy = 1.26.4
  - openff-toolkit >= 0.11
  - openmm
  - parmed >= 3.4.3
  - pip
  - pre-commit
  - pydantic >= 2
  - pytest
  - pytest-azurepipelines
  - pytest-cov
  - python-symengine
  - pytest-timeout
  - pytest-xdist
  - python >= 3.9, <= 3.12
  - requests
  - requests-mock
  - scipy
  - symengine
  - sympy
  - treelib
#  - unyt >= 3.0.3

unyt is commented out, since it leads to the following error upon the call of import foyer :

Traceback (most recent call last):
  File "/femshare/azure/md/apps/mosdef/test_new.py", line 4, in <module>
    import foyer, mbuild, parmed, gmso
  File "/foyer/foyer/__init__.py", line 3, in <module>
    from foyer.forcefield import Forcefield
  File "/foyer/foyer/forcefield.py", line 38, in <module>
    from foyer.atomtyper import find_atomtypes
  File "/foyer/foyer/atomtyper.py", line 7, in <module>
    from gmso import Topology
  File "/opt/conda/envs/mosdef/lib/python3.12/site-packages/gmso/__init__.py", line 2, in <module>
    from .core.angle import Angle
  File "/opt/conda/envs/mosdef/lib/python3.12/site-packages/gmso/core/angle.py", line 6, in <module>
    from gmso.abc.abstract_connection import Connection
  File "/opt/conda/envs/mosdef/lib/python3.12/site-packages/gmso/abc/abstract_connection.py", line 5, in <module>
    from gmso.abc.abstract_site import Site
  File "/opt/conda/envs/mosdef/lib/python3.12/site-packages/gmso/abc/abstract_site.py", line 10, in <module>
    from gmso.abc.gmso_base import GMSOBase
  File "/opt/conda/envs/mosdef/lib/python3.12/site-packages/gmso/abc/gmso_base.py", line 8, in <module>
    from pydantic.validators import dict_validator
ImportError: cannot import name 'dict_validator' from 'pydantic.validators' (/opt/conda/envs/mosdef/lib/python3.12/site-packages/pydantic/validators.py)

The reason for this way of installation is that I need to have all three packages mbuild, foyer, gmso in one place to be able to, respectively, create a file with coordinates from SMILES sequence, apply a force field to a created molecule, and convert GROMACS top and gro files into those ones compatible with LAMMPS, NAMD, etc., and to be able to install foyer from the source should I have any modifications in the code. However, since I'm not that strong with technical details of conda / mamba, in particular, environment creature, packages installation, etc., and since I'm not aware about the exact dependencies list of mbuild, foyer, gmso each [ there are regular and development versions of the yml dependencies files in their guthub repositories, and this is confusing ] , I simply merged the development versions of their dependencies lists into a single one. BTW, another Q appears whether the information about the version of the packages to be installed are up-to-date in the github repo or in the conda-forge channel.

I'm pretty sure I'm doing some excessive things in my installation routine, but I'm not sure which ones to exclude. For instance, as you mentioned foyer is in the dependencies list of gmso, whereas there is - gmso line in the environment-dev.yml of foyer. Thus, when performing the installation of foyer as :

mamba env create -f environment-dev.yml
mamba activate foyer-dev
pip install -e .

foyer is installed 1st time as a dependency for gmso, then it's uninstalled, and afterwards it's installed again by pip install. Weird, huh?

The reason why I was looking into your Dockerfile was to understand how exactly you install mbuild, foyer, gmso and why your Docker container was working, whereas mine wasn't [ regarding the latter, I know now for sure that was due to unyt package that is in environment-dev.yml of gmso ] .

Summarising, a clear installation routine of the whole MoSDeF infrastructure from the source with the lack of excessive pieces would be nice to have :)

chrisjonesBSU commented 2 months ago

If you just need the mosdef software stack and are working with your own base image, it seems like the route to go would be to install all 3 packages from anaconda rather than using the environment.yml files in each of the 3 repos, and installing from source.

conda create -n mosdef -c conda-forge mbuild gmso foyer

This should handle all of the dependencies for you.

If you absolutely need a development version of any of these packages you can still run pip install on the cloned repo after running the command above after you activate your environment. I'd suggest making a fork of the repo you want to change the source code for, clone that in your image and run pip install -e . from your fork.

iGulitch commented 2 months ago

@chrisjonesBSU , thanks for the short and yet clear explanation to my insanely long stream of words and I'm retrospectively sorry for the latter :)

chrisjonesBSU commented 2 months ago

@chrisjonesBSU , thanks for the short and yet clear explanation to my insanely long stream of words and I'm retrospectively sorry for the latter :)

No worries! I'm happy to help.

I can understand the confusion, so just to clarify, all 3 of the foyer, mbuild and GMSO repos have their own environment-dev.yml files, but the point of these files is to create a development and testing environment for that specific package, not for the mosdef software stack as a whole. Also, with anaconda, you can install a package from source after the fact, even if that package is already in your conda environment (e.g. it was installed as a dependency from another package).

iGulitch commented 2 months ago

Thanks! Now, it has become even clearer!

Still, I have a few Qs remained :

  1. the Dockerfile in this github repo of foyer is outdated, is it not?
  2. what is the most up-to-date version of, let's say, foyer? is it the one in conda repo or the one in github repo, or are those both synchronised between each other? namely, installation via conda install and via git clone <foyer repo> + pip install give the same version?

Best, @iGulitch

chrisjonesBSU commented 2 months ago

The versions of foyer available on conda coincide with the release which you can find here, so we did a foyer release 2 days ago, which is essentially a snapshot of the repo as it existed then. The last release before that was in January, so the January release on conda did not have anything that was merged between January and now. So, the GitHub repo is always the most up-to-date, but only by whatever has been committed since the last release. Pretty much all the commits since January have been maintenance and CI related, so there really isn't a big difference between foyer version 0.12.1 and the current GitHub repo. Foyer isn't currently undergoing active changes.

From looking at the Docker file, it seems up-to-date and fine from what I can tell. I haven't tried testing it out yet though.