Extends #15 (commit included as very closely related) to allow conversion of n2p2 units.
The addition of specifying n2p2 input units has little effect on the current most common use of datasets - writing LAMMPS data files, as we want these files to have units of eV/Ang, so if an input n2p2 file is already in these units, the units in dataset.write can be left unspecified, and the units will remain correct, while if the n2p2 file has units of Ha/Bohr, these will already be assumed and converted if units are specified for dataset.write.
If does, however, avoid ambiguity/incorrect behaviour in the former case, as if units are passed to dataset.write, the input units will then be mislabelled and a conversion will erroneously be applied.
The other benefit is to enable conversion of n2p2 files with "incorrect" units. It is currently always assumed that these units should be in Ha/Bohr for training, so this provides a mechanism to produce a data file with these units if they initially differ.
Potential further additions:
[x] Currently the input units can be overridden by comments in the datafiles, but perhaps if input units are passed they should have higher priority? A warning/error could be added if units are specified that differ to the comments.
[x] There is also potential to extend the input units to non-n2p2 filetypes. As we bypass ASE's unit conversion (we don't pass units when reading or writing, as neither can be kwargs, so everything is handled by Frame.change_units), this may be relatively straightforward. I haven't come across the need for this yet, but it may avoid potential confusion if all filetypes behave similarly.
Extends #15 (commit included as very closely related) to allow conversion of n2p2 units.
The addition of specifying n2p2 input units has little effect on the current most common use of datasets - writing LAMMPS data files, as we want these files to have units of eV/Ang, so if an input n2p2 file is already in these units, the units in
dataset.write
can be left unspecified, and the units will remain correct, while if the n2p2 file has units of Ha/Bohr, these will already be assumed and converted if units are specified fordataset.write
.If does, however, avoid ambiguity/incorrect behaviour in the former case, as if units are passed to
dataset.write
, the input units will then be mislabelled and a conversion will erroneously be applied.The other benefit is to enable conversion of n2p2 files with "incorrect" units. It is currently always assumed that these units should be in Ha/Bohr for training, so this provides a mechanism to produce a data file with these units if they initially differ.
Potential further additions:
Frame.change_units
), this may be relatively straightforward. I haven't come across the need for this yet, but it may avoid potential confusion if all filetypes behave similarly.