materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.51k stars 863 forks source link

Atom labels in CIF file are silently rewritten by CifWriter #3772

Open fxcoudert opened 6 months ago

fxcoudert commented 6 months ago

Python version

Python 3.11.6

Pymatgen version

2024.4.13

Operating system version

macOS 14.4.1

Current behavior

This is related to https://github.com/materialsproject/pymatgen/issues/3761 but different. I have upgraded my pymatgen from 2023.10.4 and 2024.4.13 and I have workflows that fail as a result of the update. This is because CifWriter now silently replaces atom labels, even when they were unique! This seems very unnatural (and makes my current code fail, because I am writing bond specifications for the labels in the structure, and they don't match in the CIF file).

with MPRester("c3OruwLchURd4NLeENE40ziu8cNOGgyx") as m:
    structure = m.get_structure_by_material_id("mp-1234")
    print(structure.labels)
    make_labels_unique(structure)
    print(structure.labels)
    print(str(CifWriter(structure)))

Al atoms have been renamed from Al0..Al3 to Al2..Al5

Expected Behavior

When labels are conformant to the CIF format (which they are in this case) they should not be altered.

Minimal example

See code above. The function make_labels_unique is:

def make_labels_unique(struct):
    from collections import Counter

    labels = [site.label for site in struct.sites]
    if len(labels) == len(set(labels)):
        # All labels are unique, nothing to do
        return

    labels = Counter(labels)
    counter = {}
    for i, site in enumerate(struct.sites):
        label = site.label
        if labels[label] > 1:
            c = counter.get(label, 0)
            site.label = f"{label}{c}" if label.isalpha() else f"{label}_{c}"
            c = c + 1
            counter[label] = c
fxcoudert commented 6 months ago

This is because if "magmom" in site.properties is true for this structure at https://github.com/materialsproject/pymatgen/blame/2d008e0dd5c430692e8dcac2505340a6bdff1642/pymatgen/io/cif.py#L1516C21-L1516C51

But I am not writing magnetic moments, and I do not know why that should override labels.

JaGeo commented 6 months ago

@fxcoudert thanks for reporting. I assume a pull-request to fix this issue would be very welcome.

fxcoudert commented 6 months ago

I confirm that removing the magnetic moments with structure.remove_site_property("magmom") does fix the issue.

Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…

JaGeo commented 6 months ago

I confirm that removing the magnetic moments with structure.remove_site_property("magmom") does fix the issue.

Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…

I see! I also don't know that functionality. Should I leave it open to see if someone else can fix it?

JaGeo commented 6 months ago

@mkhorton , maybe you now more about this.

stefsmeets commented 6 months ago

This also struck me as odd. I'm currently working on #3767 which touches this bit of the code and happy to fix this if someone tells me the expected behaviour for magnetic moments.