Closed fxcoudert closed 2 months ago
Parts of CrystalNN get the right information. For example:
crystallnn.get_cn(structure, 0)
returns 8
, which is correctcrystallnn.get_cn_dict(structure, 0)
returns {'Co': 8}
, also correctThe StructureGraph is created in get_bonded_structure()
which calls from_local_env_strategy()
. In that function (https://github.com/materialsproject/pymatgen/blob/24f7ae09125b115fdf13cb643ba5c3023c08e0d7/pymatgen/analysis/graphs.py#L273C9-L273C32) the structure graph is populated from the result of get_all_nn_info()
, which is correct.
for idx, neighbors in enumerate(strategy.get_all_nn_info(structure)):
for neighbor in neighbors:
# local_env will always try to add two edges
# for any one bond, one from site u to site v
# and another form site v to site u: this is
# harmless, so warn_duplicates=False
struct_graph.add_edge(
from_index=idx,
from_jimage=(0, 0, 0),
to_index=neighbor["site_index"],
to_jimage=neighbor["image"],
weight=neighbor["weight"] if weights else None,
edge_properties=neighbor["edge_properties"] if edge_properties else None,
warn_duplicates=False,
)
For our Co atom, we iterate over all 8 images, but half of the edges are detected as duplicates:
[/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py:437](http://localhost:8888/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py#line=436): UserWarning: Trying to add an edge that already exists from site 0 to site 0 in (1, 1, 1).
warnings.warn(
[/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py:437](http://localhost:8888/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py#line=436): UserWarning: Trying to add an edge that already exists from site 0 to site 0 in (1, 0, 0).
warnings.warn(
[/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py:437](http://localhost:8888/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py#line=436): UserWarning: Trying to add an edge that already exists from site 0 to site 0 in (0, 1, 0).
warnings.warn(
[/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py:437](http://localhost:8888/Users/fx/miniforge3/envs/matten/lib/python3.10/site-packages/pymatgen/analysis/graphs.py#line=436): UserWarning: Trying to add an edge that already exists from site 0 to site 0 in (0, 0, 1).
warnings.warn(
So this is where we are loosing half of the information. The (-1, -1, -1)
edge is considered a duplicated of (1, 1, 1)
, and so on.
Thanks for reporting this.
I am not too familiar with the CrystalNN graph objects. We might need to check if someone else has some time to check this.
@mkhorton @utf maybe?
@fxcoudert is there a chance that you could provide a fix?
@shyuep any suggestions from your side?
I don't think I can provide a fix because I don't understand the intent of the code. It is apparently done on purpose at https://github.com/materialsproject/pymatgen/blob/d3c3b65826f29b485a141649db913db742502a55/pymatgen/analysis/graphs.py#L412
# if edge is from site i to site i, constrain direction of edge
# this is a convention to avoid duplicate hops
if to_index == from_index:
if to_jimage == (0, 0, 0):
warnings.warn("Tried to create a bond to itself, this doesn't make sense so was ignored.")
return
# ensure that the first non-zero jimage index is positive
# assumes that at least one non-zero index is present
is_positive = next(idx for idx in to_jimage if idx != 0) > 0
if not is_positive:
# let's flip the jimage,
# e.g. (0, 1, 0) is equivalent to (0, -1, 0) in this case
to_jimage = tuple(-idx for idx in to_jimage) # type: ignore[assignment]
So maybe it was considered a feature, but I don't understand why. It was apparently added in https://github.com/materialsproject/pymatgen/commit/732622f4fbec6b64eb38aa86b544b04de3d5656d by @mkhorton
The original PR at https://github.com/materialsproject/pymatgen/pull/2095 is not very clear about what it tries to fix. It may also be an incomplete fix, because the PR says:
While this convention fixes the present issue, I'm not convinced it's sufficient in all cases and will give it some further thought.
But I don't think the code was later improved.
@fxcoudert I see. Thanks!
@tschaume maybe you have an idea who to tag as this bug might affect structure graph objects in general.
Good question :) Thanks @fxcoudert for digging into this and providing detailed information. @jmmshn and @kim-jiyoon were tagged in the original PR #2095. Maybe they can help or explain the intention here. Also tagging @munrojm and @esoteric-ephemera to see if they have any opinions on how this issue should be dealt with.
I believe the motivation was to create an undirected migration graph and prevent double-counting (Graph Analysis section in publication: https://www.nature.com/articles/s41524-023-01051-2) @jmmshn @hmlli
Ditto on what @kim-jiyoon mentioned. I'll post the text here:
So we were using the crystal graph to represent symmetry-distinct migration events and the issues of migration events that are equivalent up to lattice translation came up. I think the original PR was meant to address this issue.
So the opinion at the time was that whatever abstract object used to represent bonds or migration events should not consider two events that only differ by a lattice vector to be distinct objects so a canonical version chosen.
I see. But it seems that this leads to other bugs. Or: are you saying it is not a bug? In my opinion, the numbers should not change if you switch to a supercell.
Anyone able to tackle this? (Maybe someone that still uses it and profits from its solution as this will likely take a day or two to figure out... maybe more).
How about, as a middle ground solution and easy fix, we just make this duplicate detection optional?
@fxcoudert would this work for you?
I see the intent, but it seems rather specific to the application in that paper. Generally speaking, the bond graph should have as many bonds as are indicated by the coordination number.
Yes, having an option would be good. I would argue that the default should be to count the bounds "naturally". Also, because it's easier to filter out the graph afterwards, rather than recreated edges.
@fxcoudert I agree about the default. Would you be willing to implement this? Otherwise, I fear this could take a while.
@shyuep any opinion on this from your side? Also with emphasis on backward compability. Thanks in advance.
I would not suggest any change, it’s actually quite important.
@fxcoudert it is correct, it is also a choice, but I can motivate it by analogy: consider an atom that is on the boundary of a unit cell and has periodic repeats. You would not expect the periodic repeats to be within the Structure
definition. You can consider the edges/bonds similarly, and only want to store the unique edges.
However, there is good news in the sense that this was considered during the writing of StructureGraph
. If you take a look at the other methods there there are methods that will get you all 8 neighbours, I think the method is called get_connected_sites
.
I have not taken a look at this specific example, I am away from an IDE at present, but will do so next time I am doing maintenance work in case I have missed something important in this discussion.
If you take a look at the other methods there there are methods that will get you all 8 neighbours, I think the method is called get_connected_sites.
Hum, thanks for pointing that out to me. Maybe it can make sense that the storage of the graph can be considered an implementation detail, and the user may not know what is inside or how it's stored.
In any case, I would definitely add it to the documentation, but it was really surprising to me and too me a while to figure out.
@mkhorton I don't have a strong opinion here but maybe you would like to document this once you have access to an IDE.
So this is less of an implementation detail and more of something that is is very deliberate and fundamental to how these objects should be defined without a super clear UI interface.
I think we need some way to discourage users from using the graph object for coordination analysis.
Just a thought, we could just put a reference to this discussion inside of the documentation for get_bonded_structure
.
OK, actually even with this definition there is a bug. Take this example:
with MPRester(apikey) as mpr:
mp_data = mpr.materials.summary.search(
material_ids=["mp-2647057"],
fields=["material_id", "formula", "formula_pretty", "nelements", "structure"]
)
structure = mp_data[0].structure
bonded_struct = crystallnn.get_bonded_structure(structure)
for n, site in enumerate(bonded_struct.structure.sites):
print(site.specie)
print(bonded_struct.get_coordination_of_site(n))
print(len(bonded_struct.get_connected_sites(n)))
It outputs:
Co
4
8
There is only one site, Co, eight-coordinated. get_connected_sites()
returns a list of length 8, but get_coordination_of_site()
returns 4, which is incorrect. The source for that function is at
https://github.com/materialsproject/pymatgen/blob/d1361099331ba6c5217cbfc5a17c77dcdf4918a9/pymatgen/analysis/graphs.py#L803-L814
“Get the number of neighbors of site n.”: the doc is clear that the answer should be 8.
n_self_loops = sum(1 for n, v in self.graph.edges(n) if n == v)
return self.graph.degree(n) - n_self_loops
I can't say I understand the logic here.
Anyone from the original developtment team able to comment on this? @jmmshn @mkhorton ?
@JaGeo, this definitely looks like a bug.
So it looks like this is happening:
The full graph is just the node linking to itself 4 times, and the code assumes that all self-connections are double-counted.
However, since we have ensured that there is no actual double counting from the way we defined the graph the - n_self_loops
part of the expression should not be needed.
I think the fix here is just to remove the double-counting variable. @mkhorton let me know what you think. I can fix it in a PR.
I also propose that we change get_coordination_of_site
to something akin to len(self.get_connected_sites(n))
so that we have only a single point of failure.
Thanks for continuing to chase this question @fxcoudert @JaGeo @jmmshn and for investigating, I'm glad this bug was caught. I'll merge @jmmshn's PR so we can fix.
I know the original implementation was self.graph.degree(n)
too. I committed the change to self.graph.degree(n) - n_self_loops
myself, but at the time this was with the correct test value, and the error was introduced later. I do not recall the reason. Mea culpa on this.
Python version
3.11.2
Pymatgen version
2024.6.10
Operating system version
macOS 14.5
Current behavior
Take a structure with only one atom per cell, which bonds to its own images in neighboring cells. Many simple metals do that, like mp-2647057.
Apply CrystalNN to get a bonded structure. The structure for mp-2647057 looks like this:
The bonds graph is incorrect. The Co atom should be 8-connected, and there should be bonds to atoms at positions
(-1, -1, -1) (-1, 0, 0) (0, 0, -1) (0, -1, 0)
. These missing bonds have the exact same length and characteristics as the ones present, in fact they are equivalent by symmetry.If you make a 2x2x2 supercell and apply CrystalNN, you get the correct coordination.
Expected Behavior
The Co atom should be 8-coordinated.
Minimal example