pnnl / isicle

In silico chemical library engine for high-accuracy chemical property prediction
BSD 3-Clause "New" or "Revised" License
58 stars 19 forks source link

No calculation for adducts possible #1

Closed michaelwitting closed 1 year ago

michaelwitting commented 5 years ago

Dear ISICLE-Team,

I installed ISICLE and tried to run it. However, it is only working for neutral molecules. I always get the following error message:

Error in job generateAdduct while creating output files /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.xyz, /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.mol2, /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.pdb, /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.charge.
RuleException:
CalledProcessError in line 182 of /home/michael/anaconda3/envs/isicle/lib/python3.6/site-packages/isicle/rules/adducts.snakefile:
Command 'python -m isicle.scripts.generate_adduct /home/michael/lipidFiles/output/adducts/geometry_parent/IJFVSSZAOYLHEE-SSEXGKCCSA-N.mol2 /home/michael/lipidFiles/output/adducts/pKa/IJFVSSZAOYLHEE-SSEXGKCCSA-N.pka [-H] /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.mol2 /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.xyz          /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.pdb /home/michael/lipidFiles/output/adducts/geometry_-H/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.charge --forcefield gaff --steps 500 &> /home/michael/lipidFiles/output/adducts/geometry_-H/logs/IJFVSSZAOYLHEE-SSEXGKCCSA-N_-H.log' returned non-zero exit status 1.
  File "/home/michael/anaconda3/envs/isicle/lib/python3.6/site-packages/isicle/rules/adducts.snakefile", line 182, in __rule_generateAdduct
  File "/home/michael/anaconda3/envs/isicle/lib/python3.6/concurrent/futures/thread.py", line 55, in run

I attached you my input file as well.

Best regards,

Michael

test.txt

michaelwitting commented 5 years ago

PS: running python -m isicle.scripts.generate_adduct /home/michael/lipidFiles/output/adducts/geometry_parent/IJFVSSZAOYLHEE-SSEXGKCCSA-N.mol2 /home/michael/lipidFiles/output/adducts/pKa/IJFVSSZAOYLHEE-SSEXGKCCSA-N.pka [+Na] /home/michael/lipidFiles/output/adducts/geometry_+Na/IJFVSSZAOYLHEE-SSEXGKCCSA-N_+Na.mol2 /home/michael/lipidFiles/output/adducts/geometry_+Na/IJFVSSZAOYLHEE-SSEXGKCCSA-N_+Na.xyz /home/michael/lipidFiles/output/adducts/geometry_+Na/IJFVSSZAOYLHEE-SSEXGKCCSA-N_+Na.pdb /home/michael/lipidFiles/output/adducts/geometry_+Na/IJFVSSZAOYLHEE-SSEXGKCCSA-N_+Na.charge --forcefield gaff --steps 500 &> /home/michael/lipidFiles/output/adducts/geometry_+Na/logs/IJFVSSZAOYLHEE-SSEXGKCCSA-N_+Na.log directly in the command line returns no error!

PPS: I'm using Ubuntu 18.04.

smcolby commented 5 years ago

I tested the provided input on my machine and experienced the same issue. The command returns no error when running manually because all outputs are being piped to the log file (Snakemake is aware of the error despite outputs being piped to the log). Looking there first, we see:

  File "/Users/colb804/anaconda3/envs/isicle/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/colb804/anaconda3/envs/isicle/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/colb804/workspace/isicle/isicle/scripts/generate_adduct.py", line 54, in <module>
    pka = read_pka(args.pkafile)
  File "/Users/colb804/workspace/isicle/isicle/utils.py", line 42, in read_pka
    idx = [int(x) - 1 for x in df['atoms'].values[0].split(',')]
AttributeError: 'numpy.float64' object has no attribute 'split'

This suggests that the pKa output generated by cxcalc (ChemAxon) is unexpected, and as such the (not-very-robust) parser fails. Looking at the pKa file, we see it is essentially empty (just a skeleton):

$ cat output/adducts/pKa/test.pka
id      apKa1   apKa2   bpKa1   bpKa2   atoms
1

To check what might have gone wrong with the pKa calculation, we can take a look at the corresponding log file:

$ cat output/adducts/pKa/logs/test.log
[H][#8][P@](=O)([#8][C@]([H])([H])[C@]([H])([H])[N@@](C([H])([H])[H])(C([H])([H])[H])C([H])([H])[H])[#8][C@@]([H])([H])[C@]([H])([#8]-[#6](=O)[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])[C@]([H])([H])C([H])([H])[H])[C@@]([H])([H])[#8]-[#6](=O)[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])[C@@]([H])([H])C([H])([H])[H]
1           "pka: Inconsistent molecular structure."

Somewhat concerning, this error output looks nothing like your input SMILES, even when considering the various preprocessing steps taken by isicle (SMILES canonicalization, desalting, neutralization, tautomerization). Looking at the outputs of each of these steps, the structure is reasonably preserved. But note that pKa is calculated from a .mol2 geometry -- this is where I think things broke down. If we use cxcalc to generate pKa on the tautomerized SMILES, we get the following:

$ cxcalc pka -i -40 -x 40 -d large output/adducts/tautomer/test.smi
id  apKa1   apKa2   bpKa1   bpKa2   atoms
1   1.86    18.84   -6.74   -7.35   20,28,13,31

Running on the .mol2 output (from rule generateGeometry), we get the failed pKa calculation. The problem is, there is no guarantee the atom indices calculated from the tautomerized SMILES will match those of the 3D .mol2 geometry used in later steps, which is why we calculate pKa from the .mol2 file. You could try replacing the content of output/adducts/pKa/test.pka with the above, but again, no guarantee the indices will match those of the .mol2 that carries on (manually verification required).

That aside, I greatly appreciate you taking the time to file this issue. We are now aware of this edge case, and will work to ameliorate the issue on multiple fronts:

michaelwitting commented 5 years ago

Thank you very much. I checked with another compound (as PE lipid this time) and that worked. It looks like zwitterionic substances cause some problems.