theochem / iodata

Python library for reading, writing, and converting computational chemistry file formats and generating input files.
https://iodata.readthedocs.io/
GNU Lesser General Public License v3.0
127 stars 46 forks source link

Clone all io code from molmod #37

Open tovrstra opened 5 years ago

tovrstra commented 5 years ago

See:

evohringer commented 5 years ago

Once the draft version of #43 for orca is ready we can collaborate on transferring the io code from molmod and tamkin.

FanwangM commented 5 years ago

Dear Esteban, thank you for work on #43 . Shall we split the tasks of IO functions stolen from tamkin and molmod to avoid duplicate efforts, please? Except what has been implemented in iodata, cpmd.py and gamess.py both appeared in IO module in tamkin and molmod. I am currently working on qchem.py. @evohringer

FanwangM commented 5 years ago

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, https://github.com/theochem/iodata/issues/36#issuecomment-472716912. If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

The other option is we can add dependency coeds gradually in risk of duplicate work and messy files. Do you have any suggestions or a better way? Thank you. @evohringer

I can fix those ones soon,

from molmod.periodic import periodic
from molmod.io import PunchFile
from molmod import angstrom, amu, calorie, avogadro
from molmod.io.common import SlicedReader
tovrstra commented 5 years ago

@fwmeng88 Thanks for tackling this. Since you are keeping us updated through this issue, the risk for overlap is small.

We should avoid that molmod becomes a dependency, even temporarily, because it contains outdated code that is not fully inconsistent with similar code in IOData, e.g. the way the units are defined. When you need more units, you can define the here, based on constants from SciPy:

https://github.com/theochem/iodata/blob/master/iodata/utils.py#L38

Similarly, there is some overlap with the following part of iodata too:

https://github.com/theochem/iodata/blob/master/iodata/periodic.py

It would be better to directly use these modules instead of their MolMod counterparts.

All formats in MolMod that make use of the SliceReader can only be ported over after #26 is fixed. It is a mechanism for reading (parts of) trajectories. I'm not sure if we need the SliceReader functionality in IOData. Instead we might also just read in a whole trajectory, instead of subsampling it when reading. @evohringer Would it be useful to subsample a trajectory upon reading, e.g. to read only every 100th timestep in memory?

evohringer commented 5 years ago

All formats in MolMod that make use of the SliceReader can only be ported over after #26 is fixed. It is a mechanism for reading (parts of) trajectories. I'm not sure if we need the SliceReader functionality in IOData. Instead we might also just read in a whole trajectory, instead of subsampling it when reading. @evohringer Would it be useful to subsample a trajectory upon reading, e.g. to read only every 100th timestep in memory?

I think the subsampling is faster and easier done in the software where the trajectories were created. Would we return a list of dictionaries when reading in the whole trayectory. How do we implement that?

evohringer commented 5 years ago

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, #36 (comment). If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

We will start with the following formats: psf, gromacs, charmm and pdb

We will try to implement it without dependencies outside iodata.

tovrstra commented 5 years ago

Great! I'm also fine not to support subsampling. I should still work out how trajectories could be handled in #7 . I'll comment there.

FanwangM commented 5 years ago

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, #36 (comment). If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

We will start with the following formats: psf, gromacs, charmm and pdb

We will try to implement it without dependencies outside iodata.

Dear Esteban, I will take care of wfx, qchem, gamess and cpmd where wfx is almost done.

FarnazH commented 4 years ago

@evohringer and @tovrstra can you please decide what to do with these formats left from molmod: https://github.com/molmod/molmod/tree/master/molmod/io

  1. atrj
  2. cml
  3. cpmd
  4. crystal
  5. dlpoly
  6. gamess
  7. gromacs
  8. lammps
  9. psf

@evohringer have you started working on psf, gromacs, and charmm as mentioned before?

evohringer commented 4 years ago

@FarnazH : We decided to skip psf, gromacs and charmm and fpcues on pdb format instead which is accessible from all MD packages. But in principle we could add them in the future if needed.

Maybe @tovrstra can comment better on the cpmd, cp2k crystal since I have no experience with this formats.

PaulWAyers commented 4 years ago

Perhaps @evohringer and @tovrstra could make a priority-order list of which file formats are most important, in case anyone is inclined to add support? @BradenDKelly is adding *.mwfn (multiwfn) and of the ones I saw, the ability to parse a GAMESS punch file would be relevant, as then we'd have reasonable support for GAMESS, Psi4, Gaussian, Orca, and Q-Chem (at least), which covers a lot of the quantum chemistry space at least.

P.S. This is a copy of the message in the Tamkin issue (#36 ) but that issue and this one are clearly related.

FarnazH commented 4 years ago

@evohringer I understand that PDB is a better format (well-defined and versatile), but if I am not wrong, there is information in the output files that are not printed in PDB, and probably that's why the parsers were added to molmod in the first place. For example, looking at the gormacs parser in molmod (https://github.com/molmod/molmod/blob/master/molmod/io/gromacs.py), it reads time, position, velocity, and cell information from *.gro trajectory. While we are at it, I think it's useful to add these, especially because we just need to port them from molmod.

evohringer commented 4 years ago

@FarnazH No problem. @lmacaya will port the gromacs format as "gromacs.py".

FarnazH commented 4 years ago

@RichRick1 please take a look at gamess format in molmod which needs to be transferred to iodata and be added in iodata/formats module. Thanks. https://github.com/molmod/molmod/blob/master/molmod/io/gamess.py