Open kavanase opened 1 month ago
And also just to clarify, get_entries_in_chemsys
for the new MPRester
also works fine and as expected, it's just summary.search()
that has the issue:
I've found the issue, it's that in get_entries_in_chemsys()
, it uses this code to convert the input chemsys
parameter to a format that works for the summary.search()
methods:
https://github.com/materialsproject/api/blob/2c13d6a515972503b9b52447df6986974c5affd5/mp_api/client/mprester.py#L1179C9-L1187C60
if isinstance(elements, str):
elements = elements.split("-")
elements_set = set(elements) # remove duplicate elements
all_chemsyses = []
for i in range(len(elements_set)):
for els in itertools.combinations(elements_set, i + 1):
all_chemsyses.append("-".join(sorted(els)))
I think for consistency it would be best for this handling to be applied for all uses of chemsys
, not just in this function? So the handling of chemsys
across all the functions (and as used on the Materials Project website) is consistent?
The only issue is there needs to be a way to query for a single chemical system, and not just always query for what in this case turns out to be a list (e.g. ["Cd-Te", "Cd", "Te"]. Since the original rester class had the get_entries_in_chemsys
method which queried for ALL entries in the parent chemical system as well as all of it's subsystems, we elected to keep it that way. We still will want to maintain the existing query functionality of summary.search
. Perhaps this is an issue that can be fixed in the docs.
Ok! Yeah then maybe just worth clarification in the docs/docstrings? Just because "chemical system" on the MP website and in the old methods meant everything in that chemical space (i.e. "Cd-Te" -> CdxTey where x/y are any combination of integers including zero) rather than the current where it means everything with that exact same combination of elements (but with variable stoichiometry); i.e. where x/y must be > 0
Of course the responsibility lies with MP to make sure documentation is correct, but adding a note that the public docs accept PRs too if there is ever a quick change to suggest that might help provide clarification for other users: https://github.com/materialsproject/public-docs
I confess I would have also found this confusing. I suspect the underlying reason is because get_entries_in_chemsys
is often used to quickly obtain necessary data to create a PhaseDiagram
object, so is also useful to include subsystems.
Have changed the language of the docstring. Feel free to submit a PR if you feel it isn't enough @kavanase.
Have changed the language of the docstring.
For reference, this was https://github.com/materialsproject/api/commit/801c05c3f11329bfd0946a8515cc057902c483d4
Code snippet
What happened?
Related to the bug report in https://github.com/materialsproject/api/issues/859
Using the
chemsys
parameter in thesearch()
method (specifically formpr.materials.summary.search
here but this affects allBaseRester.search()
methods), not all entries in the specified chemical system are actually being returned. Correct behaviour with legacy API:But with the new API and setting
chemsys = "Cd-Te"
, only Cd_xTe_y entries are returned, and not Cd or Te-only entries (which should also be returned):Can test this for other material systems too, by just comparing the returned docs from
summary.search(chemsys="X-Y")
with the output of searching "X-Y" on the Materials Project website.Version
v0.41.2
Which OS?
Log output
No response