libAtoms / abcd

2 stars 4 forks source link

Simplified data structure #26

Closed fekad closed 4 years ago

fekad commented 5 years ago

To reduce the dependence on ASE we could use a more general structure for storing by using only a flat dictionary (without the info and arrays separation).

Advantages:

Disadvantages:

fekad commented 5 years ago

I implemented the 'flattened' data structure and we have to decide how to handle the name conflicts:

  1. through an error if there any repetition
  2. based on some hierarchy (eg. extra_infos < results < arrays < infos) overwrites the one each other.
  3. if there is a name conflict the variable can be stored with its prefix (eg info_energy)

I would prefer the first one but for example - as far as I know - the xyz reader stores the energy in the info and in the results as well.

gabor1 commented 5 years ago

I also prefer the first one.

fekad commented 5 years ago

But, in that case, you will not be able to use xyz file format for uploading the configurations. So I wouldn't close this issue until the xyz reader will be fixed in ase.

gabor1 commented 5 years ago

huh? Why would giving an error on name conflicts prevent you from uploading XYZ formatted data that doesn't have name conflicts?

gabor1 commented 5 years ago

This has nothing to do with how ASE reads XYZ. You use the ASE reader to read the XYZ file, it reads it into an ASE atoms object, separating stuff into info and array, and you pare that, checking for conflicts, and if there aren't any, you flatten the namespace. if there are conflicts, you stop with an error.

gabor1 commented 5 years ago

I wouldn't parse the "results" at all.

gabor1 commented 5 years ago

that's some fake trickery that we had so that atoms.read() followed by atoms.get_potential_energy() works correctly. But even that is old, now we are not so concerned with that not working, the XYZ reader is just supposed to fill in info and arrays. that's why I'm saying that you don't need to parse "results" at all.

fekad commented 5 years ago

ok, the conflict is only with the atoms.calc.results.

fekad commented 5 years ago

Most of the calculators store the values (all of the available properties of the calculator) so in this case, those will be ignored. It is completely fine for me...

gabor1 commented 5 years ago

you are saying that it prevents someone from using ASE to calculate stuff and then uploading directly to the database. I see. how about the following: make an ignore_ase_results option, and set it to true when the data comes from a file (which you know because you read it), and have it be false if the user calls the upload functionality with a user-supplied ASE atoms object. error on conflicts in either case.

fekad commented 5 years ago

As you mentioned before the without implementing some similar fake trickery the atoms.get_potential_energy() will not work.

gabor1 commented 5 years ago

when you create an ASE object from the database, I'm happy with everything going into info and arrays, and not results, and I don't expect get_potential_energy() to work. this is the ASE philosophy, because get_potential_energy() should only work after you've assigned a calculator and called the calculation explicitly

fekad commented 5 years ago

ok

fekad commented 5 years ago

if somebody uses a calculator and tries to save it to the database, all the properties (energy, virials, calculator name, calculator parameters ) will be lost unless you move them to the arrays and info dictionaries by hand.

gabor1 commented 5 years ago

um... no, under my suggestion above, when the ignore_ase_results option is false (default), you should parse the results section of the ASE atoms object. but you can turn it off when you read in from XYZ.

The only case this does not cover is when the user writes a script that reads from XYZ into an Atoms object, and then wants to store it in the database, because then there will automatically be a conflict. so you stop with an error and the user learns to turn on ignore_ase_results

fekad commented 5 years ago

Sorry I missed that comment because my browser didn't update the page.