muon-spectroscopy-computational-project / muon-galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
0 stars 0 forks source link

Extended XYZ file upload fails metadata #24

Closed elichad closed 1 year ago

elichad commented 1 year ago

The extended XYZ files generated by ASE for the allpos file in pm_muairss_write fail the extxyz metadata checks in Galaxy.

If auto-detected after tool run, number of structures remains at 0 (default).

If same file is uploaded via paste instead:

Error finding metadata. The file may be formatted incorrectly.

Example of such a file (from https://raw.githubusercontent.com/muon-spectroscopy-computational-project/muon-galaxy-tools/92b7688a0d7a74028896a42be6f430ce07735d51/pm_muairss_write/test-data/allpos.cell):

24
Lattice="5.47538171155659 5.3076732197316e-36 -1.31324260787022e-35 5.3076732197316e-36 5.47538171155659 3.13104997179767e-36 -1.31324260787022e-35 3.13104997179767e-36 5.47538171155659" Properties=species:S:1:pos:R:3:initial_magmoms:R:1:castep_labels:S:1:masses:R:1:castep_custom_species:S:1 pbc="T T T"
Si       0.00000000       0.00000000       0.00000000       0.00000000 NULL      28.08500000 Si
Si      -0.00000000       2.73769086       2.73769086       0.00000000 NULL      28.08500000 Si
Si       1.36884543       1.36884543       1.36884543       0.00000000 NULL      28.08500000 Si
Si       2.73769086       0.00000000       2.73769086       0.00000000 NULL      28.08500000 Si
Si       1.36884543       4.10653628       4.10653628       0.00000000 NULL      28.08500000 Si
Si       4.10653628       1.36884543       4.10653628       0.00000000 NULL      28.08500000 Si
Si       4.10653628       4.10653628       1.36884543       0.00000000 NULL      28.08500000 Si
Si       2.73769086       2.73769086      -0.00000000       0.00000000 NULL      28.08500000 Si
H        3.12515243       3.04179871       2.99423600       0.00000000          0.11342893 H:mu
H        4.17004836       2.68523378       3.91920412       0.00000000          0.11342893 H:mu
H        2.89619003       3.05587786       1.06947402       0.00000000          0.11342893 H:mu
H        2.49935677       2.73873164       5.01178210       0.00000000          0.11342893 H:mu
H        1.85638414       1.30564543       0.68430012       0.00000000          0.11342893 H:mu
H        0.71010028       1.66354263       1.36614343       0.00000000          0.11342893 H:mu
H        2.92680506       4.55495797       3.17379562       0.00000000          0.11342893 H:mu
H        4.04155613       2.28597057       2.35778713       0.00000000          0.11342893 H:mu
H        2.15377218       1.68584529       2.71235746       0.00000000          0.11342893 H:mu
H        3.36924728       0.88132020       2.64112709       0.00000000          0.11342893 H:mu
H        4.63118531       3.83027598       4.64047037       0.00000000          0.11342893 H:mu
H        2.82873101       1.96391263       1.69728627       0.00000000          0.11342893 H:mu
H        2.17662455       2.14429680       3.91215608       0.00000000          0.11342893 H:mu
H        1.80584952       2.88296106       2.74675920       0.00000000          0.11342893 H:mu
H        1.40330495       0.55998017       1.52252130       0.00000000          0.11342893 H:mu
H        3.25104718       3.56587562       4.15722314       0.00000000          0.11342893 H:mu

My initial assumption is that this is probably something misconfigured in the set_meta of the extxyz datatype.

patrick-austin commented 1 year ago

Just looking at this visually, what's going on with castep_labels:S:1? If I'm reading this correctly, it's NULL for Si, but then we have nothing in the column for the muons - that seems like a formatting error to me. But I don't know what the castep_label is supposed to do anyway.

elichad commented 1 year ago

Ahhh. The castep_labels list comes from ASE: https://gitlab.com/ase/ase/-/blob/master/ase/io/castep.py#L286 My guess is that if the label is NULL, then the species is used, and if the label is empty, then the castep_custom_species is used? And actually, thinking about it, this must be an ASE import problem, because that's all set_meta does (I mixed it up with the sniffer in my mind). And if I try it - yes:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/formats.py", line 736, in read
    return next(_iread(filename, slice(index, None), format, io,
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/parallel.py", line 275, in new_generator
    for result in generator(*args, **kwargs):
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/formats.py", line 803, in _iread
    for dct in io.read(fd, *args, **kwargs):
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/formats.py", line 559, in wrap_read_function
    for atoms in read(filename, index, **kwargs):
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/extxyz.py", line 773, in read_xyz
    yield _read_xyz_frame(fileobj, natoms, properties_parser, nvec)
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/extxyz.py", line 416, in _read_xyz_frame
    row = tuple([conv(val) for conv, val in zip(convs, vals)])
  File "/home/qci88181/muon-galaxy-tools/.venv/lib/python3.8/site-packages/ase/io/extxyz.py", line 416, in <listcomp>
    row = tuple([conv(val) for conv, val in zip(convs, vals)])
ValueError: could not convert string to float: 'H:mu'

So it's actually a table formatting problem - [extxyz spec] explains how the Properties work in line 2, and I think the empty label means it skips that column completely when trying to read those lines, and then tries to read H:mu as the masses variable.

Technically this is a bug, but not sure anyone will care about fixing it - who's really going to be writing out their customised CASTEP data to XYZ, rather than CELL which it's built for? The bigger problem is that it's ASE outputting a file it then can't read back in.

TL;DR I'll report this to ASE and close this issue.