Closed kjappelbaum closed 5 months ago
@n0w0f can you remind me if there is a shared location with the structures? Then I could try to have a first run of this script.
@n0w0f can you remind me if there is a shared location with the structures? Then I could try to have a first run of this script.
we have a copy of the data at /work/structllm_data
draco
,
I also have it in the gdrive.
@n0w0f what is the current strategy you use for dependency? Do you use
pdm
or something else?
pdm
at some point started causing issue for me, now i just maintain the conda env and pyproject.toml
is updated manually.
@n0w0f I now added info about the spacegroup (optionally).
A bit ugly is that one of the methods now also returns the space-group string. In the past, I worked with IStructure
, which allows convenient caching-based design patterns. But IStructure
is supposed to be deprecated.
The other alternative would be to share the symmetrized structure via the object's state. But then we would need to create a new object for every structure.
The other alternative would be to share the symmetrized structure via the object's state. But then we would need to create a new object for every structure.
I also favor the current implementation over having a object's state.
If you are concerned about having another return variable, we can add spacegroup to env itself, wherein env
is nested with spacegroup and list[sites].
ok, I'll now just run some mining with the current implementation
Let's see how it goes, perhaps we can really see some redundance:
Total number of structures: 1965
Total number of environments: 8489
Total number of unique environments: 5296
@n0w0f let me know what I should still address in this PR. Otherwise, it might be good to merge if you want to have it in the codebase (to prevent that we pile up merge conflicts)
@n0w0f let me know what I should still address in this PR. Otherwise, it might be good to merge if you want to have it in the codebase (to prevent that we pile up merge conflicts)
I think we can merge this already. and maybe add smiles as a representation in a separate PR
ok, then let me check the failing test and I'll merge
implements draft of #50
Decomposes structure in unique local environments. Serializes those local environments into SMILES. Optionally creates a text string based on the SMILES, Wyckoff symbols and central atoms
openbabel
as an additional dependency@n0w0f what is the current strategy you use for dependency? Do you use
pdm
or something else?