lab-cosmo / chemiscope

An interactive structure/property explorer for materials and molecules
http://chemiscope.org
BSD 3-Clause "New" or "Revised" License
121 stars 29 forks source link

Better input format for shapes #292

Closed ceriottm closed 9 months ago

ceriottm commented 11 months ago

The purpose of this PR is to slim down the format for storing structures in the input file. I think the functionality is sufficiently new that we don't need to worry about being backward compatible, but we should move relatively fast before it becomes an issue. Main point we want to address here is that storing vertices for each atom is a huge overhead if all that happens is that a structure is rotated or displaced.

I suggest to proceed as follows. Shapes are still stored as entries in a dictionary. Each entry contains

export interface BaseShapeData {
    kind: string;
    settings:  {};
    frame_settings: undefined | {};
    atom_settings: undefined | {};
}

Each shape is generated by passing the settings to a function. If frame_settings or atom_settings are defined, then they must be a dictionaries where each parameter is an array containing the values of the parameters for each frame, or each atom in each frame.

Then, shapes are created by looping over the appropriate entities, and calling the appropriate function with an argument list created by combining settings, frame_settings and atom_settings.

For instance, one could have:

{kind: "arrow", settings={'radius': 0.1}, frame_settings: {direction: [[0,0,1],[0,2,0]]} }

which will plot an arrow in the specified direction, centered on the origin, for frames 1 and 2.

{kind: "arrow", frame_settings={'radius': [0.1, 0.2]}, atom_settings: {direction: [ [[0,0,1],[0.0,2]], [[0,2,0],[1,1,1]]]} }

will plot arrows with radius 0.1 in frame 1 (one for each of the two atoms) and arrows with radius 0.2 in frame 2.

Incidentally, this format also addresses the use case in which one wants to have shapes that are not directly attached to atoms, e.g. polyhedra or isosurfaces.

Luthaf commented 10 months ago

I agree with the idea here, it would be nice for smaller files by not repeating identical values.

A couple of things:

Another consideration: should atoms_parameter be named particles_parameters?

Overall, I would do something like this:

interface SomeShapeData {
    orientation: ...
    color: ...
    ...
}

export interface ShapeParameters<T> {
    kind: string;
    parameters:  {
        global: Partial<T>,
        structures?: Array<Partial<T>>,
        particles?: Array<Partial<T>>,
    }
}

export SomeShapeParameters = ShapeParameters<SomeShapeData>;

What is not covered by this proposal is the ability to have a handful of "template" shapes to pick between (e.g. have 3 different ellispoids, where only the position & orientation vary between particles). Not sure how much this would be worth it, but we should consider it now.

ceriottm commented 10 months ago

What is not covered by this proposal is the ability to have a handful of "template" shapes to pick between (e.g. have 3 different ellispoids, where only the position & orientation vary between particles). Not sure how much this would be worth it, but we should consider it now.

Not sure I understand what you mean here. You mean building a "template shape" composing multiple primitive and then placing them at different positions and orientations?

One major issue I see is that RN shapes are deeply associated with atoms - they are stored in the frames and not as a separate section of the dataset. This is quite a deep change and my re-implementation kind of stalled trying to disentangle shapes from frames.

Luthaf commented 10 months ago

Not sure I understand what you mean here. You mean building a "template shape" composing multiple primitive and then placing them at different positions and orientations?

something like this: we store could parameters once per template and then refer to a template in the shapes

templates = {
    A = {kind = "...", parameters: ...}
    B = {kind = "...", parameters: ...}
    C = {kind = "...", parameters: ...}
}

shapes = {
    "name" = {
        parameters = {
            global = {
                ...
            }
            particles = [
                { template: "A", orientation: ... }, // position is implicit
                { template: "A", orientation: ... },
                { template: "B", orientation: ... },
                { template: "A", orientation: ... },
                { template: "A", orientation: ... },
                { template: "C", orientation: ... },
                { template: "C", orientation: ... },
                { template: "B", orientation: ... },
                { template: "B", orientation: ... },
            ]
        }
    }
}

EDIT: we could also put the templates in global or around there.


One major issue I see is that RN shapes are deeply associated with atoms - they are stored in the frames and not as a separate section of the dataset.

That's fine? This way we can easily display shapes in a standalone viewer as well. Why would we want to move the shapes outside of the structures? (I agree they should not be inside the atoms, but we don't have a concept of atom, cf https://chemiscope.org/docs/api/interfaces/Structure.html).

ceriottm commented 10 months ago

OK, this is starting to get somewhere. @rosecers can you look at LAMMPS parsing and the validation code - that I have largely commented out ATM - while I fix the custom shapes and perhaps add a builtin arrow?

ceriottm commented 10 months ago

Looking at current version. Can you @rosecers update the test? I'll go through the rest and polish up.

Luthaf commented 10 months ago

@Luthaf any objection to removing lammps port? Finding we don't need it with the new format

You mean the Python code to extract shapes from ase Atoms in LAMMPS format? Happy to remove it if it is no longer needed!

Luthaf commented 10 months ago

I should really learn how to run eslint locally

You can set it up to automatically run inside vscode (https://marketplace.visualstudio.com/items?itemName=dbaeumer.vscode-eslint) or other editors, or run it with npx eslint on command line.

ceriottm commented 10 months ago

Hi @Luthaf I think this is ready to merge. Will wait for you to have a last look. FYI - I didn't take in your suggestion to use particles for the shape parameters to stay consistent with the nomenclature we used for the properties target.