libAtoms / abcd

1 stars 4 forks source link

downloaded structures cannot be reuploaded #81

Open stenczelt opened 4 years ago

stenczelt commented 4 years ago

If one downloads a structure from abcd then that cannot be reuploaded into an abcd database because there are keys in the info that clash with the derived ones.

Example: try uploading the following xyz using abcd upload -i mystructure.xyz

9
Lattice="8.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 8.0" Properties=species:S:1:pos:R:3:forces:R:3:momenta:R:3 formula=C2H7 hash_structure=4b9349c947068bc5964c587a95699509 uploaded=2019-12-17 16:33:24.775000 pca_coord="1.6371269960482369 -0.4954201741525486 -2.262605719687197 0.4304664930512833 -0.1717581488989806 1.233961896536607 -2.296862264849734 0.5761080651137208 -1.3161071312860189 1.9197041134491764" n_atoms=9 volume=511.9999999999995 rigged=True filename=rigged.xyz username=tks32 hash=4b55960a588b1f6c5d49bb35e8c43ada modified=2019-12-17 16:33:24.775000 energy=-431.0291038500085 pbc="T T T"
C       0.02176376      -0.00373990      -0.00026709      -0.07934612       1.29388795       0.15548667      10.67167673      -1.84278699      -0.12911628 
H      -1.00482131       0.39756844       0.49540028       0.22898722      -0.03397979       0.10761272       0.71967448      -0.06530201      -0.05719192 
H       0.25898810      -1.07128326       0.41986162       0.30824009      -0.96625517       0.33141156       0.40501445       1.18310781      -0.14581153 
H       0.92805152       0.75829629       0.27429561      -0.28197640      -0.18332841      -0.14414872       1.10920845       0.30951184      -0.10040308 
H      -0.10638236      -0.03989777      -1.18662314       0.04332756      -0.07709025      -0.39946642       0.87211526       0.40476398       0.41739843 
C       3.97833884       0.00148013      -0.00128903      -0.03370890      -0.26650590      -0.05249595     -10.64171330       0.73804800      -0.59621593 
H       3.19199852       0.59457544       0.67901985      -0.14988498       0.06691593       0.14173777      -0.29269167      -0.28130119       0.35608997 
H       4.40680259      -0.93026154       0.63910868       0.00278010       0.01371944      -0.02886414      -1.14096170      -0.04648921       0.31424292 
H       4.87808974       0.73617116      -0.30100231       0.03972498       0.13220926      -0.11311652      -1.03008889      -0.03509934       0.14162040

Which then gives error of:

({'pbc', 'n_atoms', 'calculator_name', 'derived', 'calculator_parameters', 'cell', 'formula'}, {'pca_coord', 'numbers', 'positions', 'forces', 'momenta'}, {'n_atoms', '16:33:24.772000', 'energy', 'username', 'uploaded', 'rigged', 'modified', 'volume', 'filename', 'hash_structure', 'hash', 'formula'}, {})
Traceback (most recent call last):

[...]

File "/home/tks32/programs/miniconda3/envs/tum_test/lib/python3.7/site-packages/abcd/model.py", line 150, in from_atoms
    raise ValueError('All the keys must be unique!')
ValueError: All the keys must be unique!

Should we just let the properties calculated/generated on upload overwrite the ones with the same name in the file, with warnings showing? Or add some prefix to the ones from the file, perhaps with an option to switch between these behaviours?

gabor1 commented 4 years ago

if it's the derived structures (modified, uploaded, volume etc) then these should not cause an error but simply be rederived from the structure. a warning that this has been done perhaps issued to the user?

gabor1 commented 4 years ago

the structure should be hashed and checked against existing structures, and if there are matches, then a warning issued that overwriting will happen, and exit without doing anything if --yes is not given. when --yes is given, matching structures should not get a new entry, but key-value pairs updated (i.e. added for new ones and overwritten for old ones)

gabor1 commented 4 years ago

keys that are automatically derived (such as "filename", "uploaded", "modified") should be derived again and added to the dictionary, with the incoming values discarded (and a warning issued about this)

bernstei commented 3 years ago

The specific code-level problem is that both reserved_keys and info_keys (model.py around line 150) contain n_atoms.

bernstei commented 3 years ago

If the only conflict is between reserved_keys and info_keys, it could be as simple as That could be as simple as replacing info_keys = set(atoms.info.keys()) with info_keys = set(atoms.info.keys()) - reserved_keys

gabor1 commented 3 years ago

I agree this should just work, and we need to ensure that incoming keys (whatever they are), if the configuration is formally valid, it should be processed properly.

gelzinyte commented 3 years ago

when abcd keys and keys in user's structures clash (e.g. number of atoms), give options to upload: