materials-data-facility / scythe

An extensible library of tools that extract metadata from scientific files
Apache License 2.0
15 stars 3 forks source link

crystal_structure cif test not passing #34

Open jat255 opened 2 years ago

jat255 commented 2 years ago

I installed MaterialsIO using poetry (see https://github.com/jat255/MaterialsIO/tree/poetry_compat), and I'm getting test failures on the crystal structure parser. It appears the issue is the ASE adapter, as it results in an incorrect interpretation (at least according to test_crystal_structure.py). Here are the test failures:

>       assert isclose(output['crystal_structure']['number_of_atoms'], 5070.0)
E       assert False
E        +  where False = isclose(4656.0, 5070.0)

>       assert output == {'material': {'composition': 'Co270H1680C1872N324O924'},
                          'crystal_structure': {'space_group_number': 146,
                                                'stoichiometry': 'A45B54C154D280E312'}}
E       AssertionError: assert {'crystal_str...728N288O864'}} == {'crystal_stru...872N324O924'}}
E         Differing items:
E         {'material': {'composition': 'Co240H1536C1728N288O864'}} != {'material': {'composition': 'Co270H1680C1872N324O924'}}
E         {'crystal_structure': {'space_group_number': 210, 'stoichiometry': 'A5B6C18D32E36'}} != {'crystal_structure': {'space_group_number': 146, 'stoichiometry': 'A45B54C154D280E312'}}
E         Full diff:
E         - {'crystal_structure': {'space_group_number': 210,
E         ?                                              - ^
E         + {'crystal_structure': {'space_group_number': 146,
E         ?                                               ^^
E         -                        'stoichiometry': 'A5B6C18D32E36'},
E         ?                                             ^   ^^^  ^
E         +                        'stoichiometry': 'A45B54C154D280E312'},
E         ?                                           +  ^^  ++++ ^  ^^
E         -  'material': {'composition': 'Co240H1536C1728N288O864'}}
E         ?                                  ^   --     -  ^^ ^^
E         +  'material': {'composition': 'Co270H1680C1872N324O924'}}
E         ?                                  ^    ++  +   + ^ ^^

Here is the list of packages installed in my environment. I tried to follow the package versions in requirements.txt and setup.py:

Output of "pip list": ``` Package Version --------------------- ----------- ase 3.22.1 atomicwrites 1.4.0 attrs 21.4.0 backcall 0.2.0 boto3 1.20.35 botocore 1.23.35 cached-property 1.5.2 certifi 2021.10.8 cffi 1.15.0 chardet 4.0.0 charset-normalizer 2.0.10 click 8.0.3 cloudpickle 2.0.0 coverage 6.2 coveralls 3.3.1 cryptography 36.0.1 cycler 0.11.0 Cython 0.29.26 dask 2021.12.0 debugpy 1.5.1 decorator 5.1.1 dftparse 0.3.0 dfttopif 1.1.0 dill 0.3.4 dnspython 2.2.0 docopt 0.6.2 docutils 0.17.1 entrypoints 0.3 et-xmlfile 1.1.0 fair-research-login 0.2.6 fett 0.3.2 flake8 4.0.1 fsspec 2022.1.0 globus-nexus-client 0.4.1 globus-sdk 3.2.1 greenlet 1.1.2 h5py 3.6.0 hyperspy 1.6.5 idna 3.3 ijson 3.1.4 imageio 2.9.0 importlib-metadata 4.2.0 importlib-resources 5.4.0 ipykernel 6.7.0 ipyparallel 8.1.0 ipython 7.31.0 isodate 0.6.1 jedi 0.18.1 jmespath 0.10.0 jsonlines 3.0.0 jsonschema 4.4.0 jupyter-client 7.1.0 jupyter-core 4.9.1 kiwisolver 1.3.2 linear-tsv 1.1.0 llvmlite 0.34.0 locket 0.2.1 materials-io 0.0.1 matplotlib 3.4.3 matplotlib-inline 0.1.3 mccabe 0.6.1 mdf-toolbox 0.5.11 monty 2022.1.12.1 more-itertools 8.12.0 mpmath 1.2.1 natsort 8.0.2 nest-asyncio 1.5.4 networkx 2.6.3 numba 0.51.2 numexpr 2.8.1 numpy 1.21.1 openpyxl 3.0.9 packaging 21.3 palettable 3.3.0 pandas 1.1.5 parso 0.8.3 partd 1.2.0 pbr 5.8.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 7.2.0 Pint 0.18 pip 21.3.1 pluggy 1.0.0 prettytable 3.0.0 prompt-toolkit 3.0.24 psutil 5.9.0 ptyprocess 0.7.0 py 1.11.0 pycalphad 0.9.2 pycodestyle 2.8.0 pycparser 2.21 PyDispatcher 2.0.5 pyflakes 2.4.0 Pygments 2.11.2 PyJWT 2.3.0 pymatgen 2018.12.12 pymongo 3.12.3 pyparsing 3.0.6 pypif 2.1.2 pyrsistent 0.18.0 pytest 3.10.1 pytest-cov 2.9.0 python-dateutil 2.8.2 python-jsonrpc-server 0.3.4 python-magic 0.4.24 pytz 2021.3 PyWavelets 1.2.0 PyYAML 5.4.1 pyzmq 22.3.0 requests 2.27.1 rfc3986 2.0.0 ruamel.yaml 0.17.20 ruamel.yaml.clib 0.2.6 s3transfer 0.5.0 scikit-image 0.19.1 scipy 1.6.1 setuptools 59.5.0 setuptools-scm 6.3.2 six 1.16.0 snooty-lextudio 1.11.6 sparse 0.13.0 spglib 1.16.3 SQLAlchemy 1.4.29 stevedore 1.32.0 symengine 0.7.2 sympy 1.8 tableschema 1.20.2 tabulate 0.8.9 tabulator 1.53.5 tifffile 2021.11.2 tinydb 4.5.2 toml 0.10.2 tomli 2.0.0 toolz 0.11.2 tornado 6.1 tqdm 4.62.3 traitlets 5.1.1 traits 6.3.2 typing-extensions 3.10.0.2 ujson 1.35 unicodecsv 0.14.1 urllib3 1.26.8 watchdog 1.0.2 wcwidth 0.2.5 wheel 0.37.0 xarray 0.20.2 xlrd 2.0.1 xmltodict 0.12.0 zipp 3.7.0 ```
jat255 commented 2 years ago

In this setup, test_ase.py is also failing, due to the version of ase at 3.22.1. I did some tests on version compatibility with the current test suite, and updated my pyproject.toml to reflect these requirements. This should be reflected in setup.py and requirements.txt.

Here are some tests on versions:

❌ = Failed ✅ = Passed

ase version` test_crystal_structure test_ase notes
3.22.1
3.21.1
3.20.1
3.19.3
3.19.2
3.19.1
3.19.0
3.18.2 Could not import from ase.io.jsonio import create_ndarray in materials_io.ase.AseParser
WardLT commented 2 years ago

Thanks for the heads-up and the really careful validation of the version dependence! Any preferences on a solution that only works for the latest ASE vs one that detects versions?

jat255 commented 2 years ago

I'm partial to poetry for version management, since it's dependency resolver is pretty robust (and much much faster than something like conda). A version string of "~3.19" stipulates any version of 3.19.X, but will not allow updates to 3.20.X or above. I think something like that might be possible in setup.py and requirements.txt, but I'd have to look into it (and I've got to run to a meeting now).

WardLT commented 2 years ago

Makes sense. I've really liked using poetry in another project.

I was thinking about whether the matIO parser should support old versions, more than how we would enforce it. Any advice there?

On Tue, Jan 18, 2022 at 3:02 PM Joshua Taillon @.***> wrote:

I'm partial to poetry for version management, since it's dependency resolver is pretty robust (and much much faster than something like conda). A version string of "~3.19" stipulates any version of 3.19.X, but will not allow updates to 3.20.X or above. I think something like that might be possible in setup.py and requirements.txt, but I'd have to look into it (and I've got to run to a meeting now).

— Reply to this email directly, view it on GitHub https://github.com/materials-data-facility/MaterialsIO/issues/34#issuecomment-1015789327, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVNYDNSQRNL3F6GJEZCT63UWXBPHANCNFSM5MH5G6BA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

jat255 commented 2 years ago

My suggestion would be that any actively developed matIO parsers should aim to target whatever the latest version of the dependent package is, with supported versions well-specified in the pyproject.toml file. Were a parser to get "out of date" (like the ase one), a stopgap would be to enforce a maximum version to ensure a working installation.

In a package like this that depends so heavily on outside libraries, I don't think there's substantial benefit (but there is significant cost) to trying to support multiple non-backwards-compatible versions of each library (via something like if ase.__version__ <= 3.19, then do X; else do Y). That will make testing a bear (although not impossible). It would be cleaner and easier to properly version MaterialsIO, so whatever version of Materials IO is installed depends on a verified collection of third-party packages (at specific versions). If you need features of an updated ase, for instance, that should trigger a new release of MaterialsIO. I think that would be easier maintenance-wise. Just my $0.02, however :)