materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.53k stars 867 forks source link

OptimadeRester #3753

Closed JaGeo closed 7 months ago

JaGeo commented 7 months ago

Python version

3.11

Pymatgen version

2024.3.1

Operating system version

Linux

Current behavior

OptimadeRester seems to be broken. Will not get any results via:

from pymatgen.ext.optimade import OptimadeRester
with OptimadeRester(aliases_or_resource_urls=["mcloud.mc3d"]) as o:
    structures=o.get_structures(["Ti"], nelements=1 )

I then get:

Failed to parse https://aiida.materialscloud.org/mc3d/v1/info when validating: Expecting value: line 1 column 1 (char 0) Could not retrieve required information from provider mcloud.mc3d and url='https://aiida.materialscloud.org/mc3d/v1/structures?filter=(elements HAS ALL "Ti") AND (nelements=1)&response_fields=cartesian_site_positions,lattice_vectors,species,species_at_sites': Expecting value: line 1 column 1 (char 0) {}

I actually get this for everything except "mp".

Expected Behavior

Return a material

Minimal example

from pymatgen.ext.optimade import OptimadeRester
with OptimadeRester(aliases_or_resource_urls=["mcloud.mc3d"]) as o:
    structures=o.get_structures(["Ti"], nelements=1 )

Relevant files to reproduce this bug

None

mkhorton commented 7 months ago

Thanks for the report. I had used it recently and noticed a few endpoints broken, but wasn’t sure if it was them being out of spec or an issue with the parsing logic — seems likely the latter.

When this client was first added there was no good alternative, but now there is a very nice client from @ml-evs over in optimade-python-tools which may be helpful. I added a pointer to this in the pymatgen client docstring.

Nevertheless, we should either fix the pymatgen client or formally deprecate it, or perhaps change the backend to use optimade-python-tools. It is useful to be able to easily retrieve entries in Structure format so I would like to be able to retain this somehow.

JaGeo commented 7 months ago

Thanks, @mkhorton . Will then switch to the optimade-python-tools for this particular example !

ml-evs commented 7 months ago

Looks like something has gone wrong with the URL construction, the "v1" is being added in the wrong place, it should be https://aiida.materialscloud.org/mc3d/optimade/v1/ rather than https://aiida.materialscloud.org/mc3d/v1/optimade, so either the aliases in pymatgen need to be refreshed or there is a bug.

This does come up every now and again, if desirable I can move the optimade-python-tools client into its own package with only httpx as a dependency. It wouldn't be too much work to replicate the current OptimadeRester functionality by wrapping the OptimadeClient class we have.

ml-evs commented 7 months ago

Equivalent query with o-p-t, after pip install optimade[http-client]:

optimade-get --response-fields "cartesian_site_positions,lattice_vectors,species,species_at_sites" --filter 'elements HAS ALL "Ti" AND nelements=1' https://aiida.materialscloud.org/mc3d/optimade 

or

from optimade.client import OptimadeClient

client = OptimadeClient("https://aiida.materialscloud.org/mc3d/optimade")
results = client.get(
    filter='elements HAS ALL "Ti" AND nelements=1',
    response_fields=[
        "cartesian_site_positions",
        "lattice_vectors",
        "species",
        "species_at_sites",
    ],
)
JaGeo commented 7 months ago

My use-case is showing optimade in a lecture including code example. It worked when I drafted it a few weeks ago 😅. But I can surely switch as well.

ml-evs commented 7 months ago

Weird, I just tried your code snippet and it works fine for me:

>>> from pymatgen.ext.optimade import OptimadeRester
>>> with OptimadeRester(aliases_or_resource_urls=["mcloud.mc3d"]) as o:
...     structures=o.get_structures(["Ti"], nelements=1 )
...
mcloud.mc3d: 100%|█████████████████████████████████████| 6/6 [00:00<?, ?it/s]
>>> structures
{'mcloud.mc3d': {'105195': Structure Summary
Lattice
    abc : 2.8175362595703968 2.8175362595703968 2.8175362595703968
 angles : 109.47122063449069 109.47122063449069 109.47122063449069
 volume : 17.21815648938191
      A : -1.6267053179145 1.6267053179145 1.6267053179145
      B : 1.6267053179145 -1.6267053179145 1.6267053179145
      C : 1.6267053179145 1.6267053179145 -1.6267053179145
    pbc : True True True
...
JaGeo commented 7 months ago

I had installed the latest pymatgen and the one before.

Which Python version?

JaGeo commented 7 months ago

Installed pymatgen 2024.04.12 again. Still fails. This time with Python 3.9.

JaGeo commented 7 months ago

When I manually correct the url construction within the code, it works... Will check in the code if some logic got removed at some point.

JaGeo commented 7 months ago

My feeling is that it is this commit: https://github.com/materialsproject/pymatgen/commit/cf1b61a88a8d1f959de4f29c0ce48d5a46611563

urljoin somehow cuts off the "optimade" from the URL, then joins the URLs and then the "optimade" is missing. I don't really have time to fix it properly. I could replace the urljoin with join again but I guess this is not a good fix as we will then have problems on other operating systems again.

Adding this here as well: https://stackoverflow.com/questions/10893374/python-confusions-with-urljoin

@janosh @ml-evs , any ideas? (Maybe also operating system dependent?)

ml-evs commented 7 months ago

My feeling is that it is this commit: cf1b61a

urljoin somehow cuts off the "optimade" from the URL, then joins the URLs and then the "optimade" is missing. I don't really have time to fix it properly. I could replace the urljoin with join again but I guess this is not a good fix as we will then have problems on other operating systems again.

Adding this here as well: https://stackoverflow.com/questions/10893374/python-confusions-with-urljoin

@janosh @ml-evs , any ideas? (Maybe also operating system dependent?)

Yep, sorry, I realise I was running your failing snippet with 2024.1. I can try to fix this now.

ml-evs commented 7 months ago

Annoyingly simple fix... some aliases are root domains whereas others already have paths; if you don't end a URL with a path with a slash, then urljoin scrubs the last section of the path.

JaGeo commented 7 months ago

Annoyingly simple fix... some aliases are root domains whereas others already have paths; if you don't end a URL with a path with a slash, then urljoin scrubs the last section of the path.

Ah, didn't realize the additional "/" would already fix it. 😅 But good to know! Maybe, we should add an additional test as well to avoid this from breaking in the future. At the moment, only MP is tested

ml-evs commented 7 months ago

Annoyingly simple fix... some aliases are root domains whereas others already have paths; if you don't end a URL with a path with a slash, then urljoin scrubs the last section of the path.

Ah, didn't realize the additional "/" would already fix it. 😅 But good to know! Maybe, we should add an additional test as well to avoid this from breaking in the future. At the moment, only MP is tested

See #3756! I'm thinking of adding a couple more tests that depend on other "stable" databases, indeed. Also that refreshing the aliases list can fail nicely even when a given database is down.