materialsproject / api

New API client for the Materials Project
https://materialsproject.github.io/api/
Other
104 stars 38 forks source link

[Bug]: `chemsys` parameter not returning all entries in `chemsys` #918

Open kavanase opened 1 month ago

kavanase commented 1 month ago

Code snippet

# new API:
with MPRester("new_MP_API_key") as mpr:
    Cd_Te_PD_docs = mpr.materials.summary.search(chemsys="Cd-Te")
print(len(Cd_Te_PD_docs))  # should be 23, returns 11

What happened?

Related to the bug report in https://github.com/materialsproject/api/issues/859

Using the chemsys parameter in the search() method (specifically for mpr.materials.summary.search here but this affects all BaseRester.search() methods), not all entries in the specified chemical system are actually being returned. Correct behaviour with legacy API:

image

But with the new API and setting chemsys = "Cd-Te", only Cd_xTe_y entries are returned, and not Cd or Te-only entries (which should also be returned):

image

Can test this for other material systems too, by just comparing the returned docs from summary.search(chemsys="X-Y") with the output of searching "X-Y" on the Materials Project website.

Version

v0.41.2

Which OS?

Log output

No response

kavanase commented 1 month ago

And also just to clarify, get_entries_in_chemsys for the new MPRester also works fine and as expected, it's just summary.search() that has the issue:

image
kavanase commented 1 month ago

I've found the issue, it's that in get_entries_in_chemsys(), it uses this code to convert the input chemsys parameter to a format that works for the summary.search() methods: https://github.com/materialsproject/api/blob/2c13d6a515972503b9b52447df6986974c5affd5/mp_api/client/mprester.py#L1179C9-L1187C60

if isinstance(elements, str):
        elements = elements.split("-")

elements_set = set(elements)  # remove duplicate elements

all_chemsyses = []
for i in range(len(elements_set)):
    for els in itertools.combinations(elements_set, i + 1):
        all_chemsyses.append("-".join(sorted(els)))

I think for consistency it would be best for this handling to be applied for all uses of chemsys, not just in this function? So the handling of chemsys across all the functions (and as used on the Materials Project website) is consistent?

munrojm commented 1 month ago

The only issue is there needs to be a way to query for a single chemical system, and not just always query for what in this case turns out to be a list (e.g. ["Cd-Te", "Cd", "Te"]. Since the original rester class had the get_entries_in_chemsys method which queried for ALL entries in the parent chemical system as well as all of it's subsystems, we elected to keep it that way. We still will want to maintain the existing query functionality of summary.search. Perhaps this is an issue that can be fixed in the docs.

kavanase commented 1 month ago

Ok! Yeah then maybe just worth clarification in the docs/docstrings? Just because "chemical system" on the MP website and in the old methods meant everything in that chemical space (i.e. "Cd-Te" -> CdxTey where x/y are any combination of integers including zero) rather than the current where it means everything with that exact same combination of elements (but with variable stoichiometry); i.e. where x/y must be > 0

mkhorton commented 1 month ago

Of course the responsibility lies with MP to make sure documentation is correct, but adding a note that the public docs accept PRs too if there is ever a quick change to suggest that might help provide clarification for other users: https://github.com/materialsproject/public-docs

I confess I would have also found this confusing. I suspect the underlying reason is because get_entries_in_chemsys is often used to quickly obtain necessary data to create a PhaseDiagram object, so is also useful to include subsystems.

munrojm commented 1 month ago

Have changed the language of the docstring. Feel free to submit a PR if you feel it isn't enough @kavanase.

kalvdans commented 1 month ago

Have changed the language of the docstring.

For reference, this was https://github.com/materialsproject/api/commit/801c05c3f11329bfd0946a8515cc057902c483d4