lamalab-org / xtal2txt

MIT License
6 stars 0 forks source link

Embeddings of atoms from different representations #35

Open n0w0f opened 6 months ago

n0w0f commented 6 months ago

@kjappelbaum , In order to check the similarity between atoms , or do those King - Queen = Man - Women analysis I would like to embed individual atoms with models trained on different representation. This is as a follow up to see if composition or atoms means anything for smaller models

For slice and composition maybe i can keep atom as the first token and pad all other token, but for crystal-llm or cif_rep atoms usually comes in the later part of the representation , would keeping atom at the beginning work for these representations ?

kjappelbaum commented 6 months ago

so you would like to have one vector per atom in a structure?

n0w0f commented 6 months ago

so you would like to have one vector per atom in a structure?

I would like to get a vector for an atoms, not in the context of the atom being in any particular structure, but standalone. for eg ( Na -> model -> vector).

so that i can see if all the alkali elements are similar for models trained with different representations

n0w0f commented 6 months ago

can i use the learned token embedding ? or do i even need to pass it through the model if it is is not in the context of structure ?

kjappelbaum commented 6 months ago

can i use the learned token embedding ? or do i even need to pass it through the model if it is is not in the context of structure ?

ah, for this, people have used the learned embeddings of different tokens. Some existing techniques are here https://github.com/kjappelbaum/element-coder

kjappelbaum commented 4 months ago

@n0w0f did you ever give this a look, do you plan to still look into it?

n0w0f commented 4 months ago

I did not yet, but I think there can be lot of hidden insights there, and would love to followup