Closed CompRhys closed 2 years ago
Definitely not desired behaviour!
That was my code so mea culpa. Was there a reason you closed this issue?
It was related to much hash concerns about Site
and there the takeaway for me was that because two compositions are not equal if they have assigned charges then it doesn't matter that the hashes aren't equal #2235. The chemical system is related to this because a few months back I needed a hash with less collisions for my PatchedPhaseDiagram
and so the hash for Composition
was changed to hash(frozenset(Composition._data.keys()))
as it was barely slower than the sum of species charges but a lot more discriminating. In the end because so few sites are disordered and the discriminative hash is ~10% slower I decided against the PR for Site
.
I was going to submit a PR later to add this example to the docstring:
>>> Composition({"Si4+": 1, "O2-":2}).chemical_system
'O2--Si4+'
>>> Composition({"Si": 1, "O":2}).chemical_system
'O-Si'
A docstring improvement would be welcome (given that this is the current behaviour), but I do think we should change this to just return the elements. I'm less concerned with the hashing and more with expected/correct behaviour.
@property
def chemical_system(self) -> str:
"""
Get the chemical system of a Composition, for example "O-Si" for
SiO2. Chemical system is a string of a list of elements
sorted alphabetically and joined by dashes, by convention for use
in database keys.
"""
return "-".join(sorted([el.symbol for el in self.elements]))
gives
Python 3.7.10 (default, Jun 4 2021, 14:48:32)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymatgen.core.composition import Composition
>>> Composition({"Si": 1, "O":2}).chemical_system
'O-Si'
>>> Composition({"Si4+": 1, "O2-":2}).chemical_system
'O-Si'
haven't tested speed but Species.__str__
accesses symbol so it shouldn't be slower.
If happy with this I can make PR and add a test for charged composition chemical systems?
I think we should keep this definition of chemsys. We either treat Species as distinct or not by default. Pymatgen treats them as different. And there are instances where we need to define Fe2+-Fe3+-O as opposed to just Fe-O system. If we want to ignore oxidation states, we can make it a different property or method.
would it not make more sense to update chemical_system
to match it's current docstring and then add another option for the species system
@mkhorton has already said this definition was a bug?
also in the species system just as a matter to ease of manipulation using regex the use of dash as a deliminator seems like a poor choice as we have things like "O2--Cu2+".split(deliminator) = ["O2", "", "Cu2+"]
which could cause hidden errors if like me you think the current behaviour is unexpected.
Not sure if this is a bug or not, it not the behaviour i would have expected so if planned I will look at doing a docstring improvement.