nomad-coe / electronic-parsers

Apache License 2.0
18 stars 7 forks source link

Broken parsing of 0D systems in Crystal #170

Closed ndaelman-hu closed 11 months ago

ndaelman-hu commented 11 months ago

This ticket was opened to follow up with a user-reported bug. This is the description from the e-mail:

I have uploaded to Nomad a set of calculations for different polystyrene structures but in 3 out of 5 entries the processing part has failed. It seems that there is a problem when my structure was defined as an isolated molecule (0D system). The calculations are done with Crystal17 code. Have you encountered a similar issue with Crystal? Will it be possible to make the data public and accessible to everyone if the processing part has failed? The dataset name for the upload is CHARISMA-PST-2023.

ndaelman-hu commented 11 months ago

Tracing down the error (see log below), the issue stems from missing cell information (lattice and lattice_vectors) at the height of File "/usr/local/lib/python3.9/site-packages/electronicparsers/crystal/parser.py", line 658. When run in 0D mode, it appears that Crystal does not use/print out this cell data (under LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - PRIMITIVE CELL). There's nothing in the parser to handle this contingency.

@ladinesa I guess that I'll have to go through the entire parser and add checks to deal with missing lattice data. Do you see any alternatives, e.g. finding an appropriate default? On that last thought, do we now have the np.infinity property? Still, this might equally give issues down the line.

Error log:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1203, in parsing parser.parse(self.mainfile_file.os_path, self._parser_results, logger=logger, **kwargs)
File "/usr/local/lib/python3.9/site-packages/nomad/parsing/parser.py", line 398, in parse self.mainfile_parser.parse(mainfile, archive, logger)
File "/usr/local/lib/python3.9/site-packages/electronicparsers/crystal/parser.py", line 658, in parse cart_pos, atomic_numbers, atom_labels, lattice_vectors = to_system(
File "/usr/local/lib/python3.9/site-packages/electronicparsers/crystal/parser.py", line 1044, in to_system cart_pos = atomutils.to_cartesian(scaled_pos, lattice_vectors)
File "/usr/local/lib/python3.9/site-packages/nomad/atomutils.py", line 204, in to_cartesian cartesian_positions = np.dot(positions, complete_cell(cell))
File "/usr/local/lib/python3.9/site-packages/nomad/atomutils.py", line 220, in complete_cell return ase.geometry.complete_cell(cell) File "/usr/local/lib/python3.9/site-packages/ase/geometry/cell.py", line 209, in complete_cell missing = np.nonzero(~cell.any(axis=1))[0]
File "/usr/local/lib/python3.9/site-packages/numpy/core/_methods.py", line 57, in _any return umr_any(a, axis, dtype, out, keepdims) numpy.AxisError: axis 1 is out of bounds for array of dimension 0
ladinesa commented 11 months ago

I would simply put an additional check in line 1044


    if lattice_vectors is not None:
ndaelman-hu commented 11 months ago

I would simply put an additional check in line 1044

    if lattice_vectors is not None:

Thx for the suggestion. I still have to generate car_pos. Will post a solution soon.

ladinesa commented 11 months ago

I

I would simply put an additional check in line 1044

    if lattice_vectors is not None:

Thx for the suggestion. I still have to generate car_pos. Will post a solution soon.


cart_pos = atomutils.to_cartesian(scaled_pos, lattice_vectors) if lattice_vectors is not None else scaled_pos
ndaelman-hu commented 11 months ago

My apologies, the problem actually lies with pos_type == "scaled". When lattice vectors aren't present, it should obviously be cartesian. This is also what the file states. I'll see if this route fixes it.

ndaelman-hu commented 11 months ago

Indeed, line 147 Quantity("material_type", fr' ((?:MOLECULAR|SLAB) CALCULATION){br}', str_operation=lambda x: x, repeats=False), is not finding anything. That cascades to the wrong pos_type.

The calculation type is FREQUENCY. It seems like most of our system tags are not capturing anything here. I'll see if this is a version incompatibility.

ndaelman-hu commented 11 months ago

Do you see any alternatives, e.g. finding an appropriate default?

Actually, the output already states the formal value itself: (NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500) The question is: "does it really use this value?"

ndaelman-hu commented 11 months ago

So, what I would do is remove material_type from the prop_type logic and use dimensionality instead. @ladinesa can we set the pbc vector to 1D via np.array([True, False, False])? I remember you mentioning some kind of constrained here.

ladinesa commented 11 months ago

We can put another definition for material_type for 0d systems and also so lattice_parameter. Quantity('lattice_parameter', r'NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO (500)', str_operation=lambda x: np.ones((3,3)) * float(x))

It says lattice parameter is formally set to 500 so I think it does.

ladinesa commented 11 months ago

A little disclaimer, @lauri-codes wrote the crystal parser. sure, you can forgo of the material_type and look at only the lattice_parameter value to determine pbc.

ndaelman-hu commented 11 months ago

A little disclaimer, @lauri-codes wrote the crystal parser.

I see. I remember @lauri-codes changing a lot to the dimensionality and type recognition, so maybe it can be pruned.

@ladinesa What about the PBC: do they have to be arrays with all True or False, or can I mix?

ladinesa commented 11 months ago

A little disclaimer, @lauri-codes wrote the crystal parser.

I see. I remember @lauri-codes changing a lot to the dimensionality and type recognition, so maybe it can be pruned.

@ladinesa What about the PBC: do they have to be arrays with all True or False, or can I mix?

it can be mixed. it should really be a normalized quantity.

ndaelman-hu commented 11 months ago

@lauri-codes Is atomutils.wrap_positions still useful? I think the same recentering of the cell already happens in the GUI, and I don't see the value in storing a translated cell in the archive. Moreover, it is only used in the crystal parser.

lauri-codes commented 11 months ago

@ndaelman-hu: If you are sure that wrap_positions is only used in the crystal parser, and removing it would not cause any issues, then it can be safely removed.

I think my idea was that the systems stored in the archive should be somehow "normalized", meaning that e.g. positions are wrapped if periodic boundaries are used. But I'm not sure if that idea has any real value, and even if it does, it should be done by some normalizer.

ndaelman-hu commented 11 months ago

If you are sure that wrap_positions is only used in the crystal parser, and removing it would not cause any issues, then it can be safely removed.

I just removed it from the parser. It's being reviewed in a PR now. I guess it can keep on existing in atomutils. My IDE only shows 3 occurrences of wrap_positions (i.e. symmetryanalyzer.py, atomutils.py, and mof_deconstructor.py), but they seem to all be different methods with the same name.