patonlab / GoodVibes

Calculate quasi-harmonic free energies from Gaussian output files with temperature and other corrections
http://www.patonlab.colostate.edu
MIT License
134 stars 52 forks source link

Replace custom output file parsing with cclib #43

Closed berquist closed 2 years ago

berquist commented 3 years ago

This is a proposal to replace much (but not all) of the custom output parsing with cclib, so that GoodVibes can be used with any computational chemistry program that cclib supports, and not just Gaussian, ORCA, and NWChem.

It can't be a complete replacement because:

cclib also can't read COSMO-RS files, but since that's for a completely separate quantity, I won't consider in this PR.

cclib parses version information well, but I haven't investigated yet what GoodVibes is parsing as versions, hence the current draft status.

If you're ok with an extra dependency, let me know your thoughts. There may also be a version restriction, since cclib now has a minimum required Python version of 3.5 (2.7 may still work, but is no longer tested, and 2.6 definitely doesn't work).

bobbypaton commented 3 years ago

Thanks for this - integration with cclib is definitely something we've been mulling over for a while!

Any ideas why there are several builds failing on travis while others are OK - is this related to the minimum Python version requirement of cclib?

berquist commented 3 years ago

If memory serves me correctly, yes, but that may only be part of it. Let me revisit this and see what else might be going on.

On Mon, May 24, 2021 at 11:21 PM bobbypaton @.***> wrote:

Thanks for this - integration with cclib is definitely something we've been mulling over for a while!

Any ideas why there are several builds failing on travis while others are OK - is this related to the minimum Python version requirement of cclib?

— You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub https://github.com/bobbypaton/GoodVibes/pull/43#issuecomment-847498951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFRUE3XGDVTLU2Z7FAUX53TPMJTZANCNFSM4XS57XIA .

berquist commented 3 years ago

I took a look at CI and it wasn't anything to do with 3.5 being unsupported specifically. It was dependencies not being installed in the right order, and too-new versions of SciPy and NumPy were trying to be installed.

berquist commented 3 years ago

There is more code that could replaced in thermo.calc_bbe, but I think it's better to not do everything at once.

berquist commented 2 years ago
Something disappointing I just discovered is that running time python -m pytest -v before and after this PR is not an improvement... parsing time
existing 4.04s user 2.01s system 163% cpu 3.707 total
with cclib 25.18s user 3.98s system 118% cpu 24.505 total

It may take a while to find where the extra time is spent in cclib, but even then, by its very nature of parsing more things than GoodVibes needs, it will always take longer, and we can't predict how much this difference will shrink.

jvalegre commented 2 years ago

Hi @berquist - I just noticed that you replaced the lines regarding SCF and MP energies but left the line for double hybrid functionals in your "Program-agnostic charge, multiplicity, and DFT/MP/CC energy" commit (see code below). This is a bug we just fixed (check the closed issue #51 ), is this also fixed in cclib? I saw the issue open from 2019 but I'm not sure what was your final verdict (i.e. do the energies from cclib correspond to the final energy that GoodVibes needs to use or the initial SCF only?).

if line.strip().startswith('E2('):
                spe_value = line.strip().split()[-1]
jvalegre commented 2 years ago

Hi @berquist - I talked to Rob about this merge, we're gonna merge and call this branch "GVcclib". We also want to create json files to store the information obtained from cclib, that way the program will be able to skip the parsing from cclib and read json files with the info instead when the same file is used in multiple runs. I'm doing something similar in the QCORR module from AQME (https://github.com/jvalegre/aqme/blob/master/aqme/qcorr.py, line 219), but I use ccwrite to write and read a json file instead of creating the internal python cclib object.

Could you give us some insight about what's the best way to do this? Thanks for the help!!

berquist commented 2 years ago

I just noticed that you replaced the lines regarding SCF and MP energies but left the line for double hybrid functionals in your "Program-agnostic charge, multiplicity, and DFT/MP/CC energy" commit (see code below). This is a bug we just fixed (check the closed issue #51 ), is this also fixed in cclib? I saw the issue open from 2019 but I'm not sure what was your final verdict (i.e. do the energies from cclib correspond to the final energy that GoodVibes needs to use or the initial SCF only?).

I left that line in because I'm not sure it's covered properly by cclib. As you're aware, it all stems from the fact that cclib doesn't have a "final energy" but instead opts to separate out the SCF from different post HF contributions. I (probably all of us) had forgotten about https://github.com/cclib/cclib/issues/746, and we didn't come to a conclusion. The answer for now is "probably not", but we need to check the mpenergies attribute first.

we're gonna merge and call this branch "GVcclib"

Just to make sure, are you aware that this was merged to the master branch?

that way the program will be able to skip the parsing from cclib and read json files with the info instead when the same file is used in multiple runs.

Yes, this is a good idea. (You might want to add some functionality that checks if the output file has changed and reparsing if necessary, like storing MD5 hash, otherwise it's easy to get bitten by having the stale JSON stick around.)

I use ccwrite to write and read a json file instead of creating the internal python cclib object

This seems reasonable to me as long as working with the JSON representation is easy.