jensengroup / xyz2mol

Converts an xyz file to an RDKit mol object
MIT License
250 stars 70 forks source link

Conversion takes too much time for some molecules #14

Closed n-yoshikawa closed 5 years ago

n-yoshikawa commented 5 years ago

This script takes too much time (more than 30 minutes) for some molecules.

How to reproduce:

Run the following script.

python xyz2mol.py test.xyz

The content of test.xyz is as follows:

61
0FX_3VBL_A
O        -10.51300       36.55700      -27.27400
P        -11.25000       35.54400      -26.43500
O        -10.78200       34.13900      -26.15300
O        -12.65500       35.24600      -27.15400
C        -13.51900       36.30600      -27.56700
C        -14.93800       36.12500      -27.04000
O        -14.90500       35.78700      -25.64900
C        -15.44200       34.90700      -27.80900
O        -16.75100       34.53100      -27.35500
C        -15.29800       35.10000      -29.32200
N        -15.64600       33.87000      -30.01300
C        -13.87800       35.53100      -29.70800
C        -13.59800       35.60000      -31.21200
O        -13.48100       36.68500      -28.94500
O        -11.65300       36.16700      -25.00900
P        -10.58100       36.93800      -24.08500
O         -9.26000       36.18000      -24.12600
O        -11.33600       37.17300      -22.80000
O        -10.36700       38.39500      -24.71900
C        -11.50200       39.22600      -24.97700
C        -11.07500       40.67100      -25.21600
O        -10.51700       41.22000      -24.02100
C         -9.91900       40.67700      -26.21600
O        -10.44500       40.76600      -27.54300
C         -9.09100       41.89000      -25.81000
C         -9.32400       41.96600      -24.30500
N         -8.16000       41.47000      -23.56100
C         -8.17300       40.23100      -23.03100
C         -7.07300       39.76700      -22.32500
C         -7.01100       38.39400      -21.69400
C         -7.09000       42.26400      -23.45200
O         -7.09800       43.41300      -23.96200
N         -5.99900       41.85800      -22.77600
C         -5.99300       40.62500      -22.21800
O         -4.97900       40.25600      -21.60000
H        -13.17510       37.10900      -27.08020
H        -15.50310       36.93190      -27.21210
H        -14.06460       35.28360      -25.44810
H        -14.83410       34.15150      -27.56480
H        -17.44190       35.01140      -27.89520
H        -15.93440       35.81520      -29.61090
H        -15.54980       34.00320      -30.99940
H        -15.03590       33.13600      -29.71470
H        -13.29060       34.79340      -29.37500
H        -12.65230       35.88820      -31.36210
H        -13.74010       34.69780      -31.61930
H        -14.21870       36.25900      -31.63670
H        -12.11810       39.19250      -24.19000
H        -11.97860       38.88690      -25.78810
H        -11.84030       41.22430      -25.54490
H         -9.37620       39.84290      -26.11760
H         -9.69160       40.77020      -28.20060
H         -9.41570       42.71780      -26.26750
H         -8.12280       41.75190      -26.01850
H         -9.47860       42.92170      -24.05440
H         -8.97580       39.64610      -23.14680
H         -6.12840       38.27580      -21.23900
H         -7.74680       38.30190      -21.02310
H         -7.11680       37.69640      -22.40260
H         -5.20450       42.45870      -22.68740
H        -16.59180       33.62440      -29.80050

This xyz file is generated from 0FX_3VBL_A of the Platinum Dataset 2017_01 by Open Babel 3.0.0.

This molecule is actually charged to -1 according to Chem.rdmolops.GetFormalCharge(), but adding correct charge information (replacing the second line to charge=-1=) did not solve the problem.

jhjensen2 commented 5 years ago

What SMILES string did the code generate and how long does it take to make it?

n-yoshikawa commented 5 years ago

I couldn't get SMILES string from the code since it took too much time.

SMILES generated from the original SDF file by RDKit (with option isomericSmiles=False) is Cc1cn(C2CC(O)C(COP(=O)([O-])OP(=O)([O-])OC3OC(C)C([NH3+])C(O)C3O)O2)c(=O)[nH]c1=O.

Original SDF:

0FX_3VBL_A
frProtoss 02171720243D 1   1.00000     0.00000     0
Protoss
 61 63  0  0  0  0            999 V2000
  -10.5130   36.5570  -27.2740 O   0  0  0  0  0  0  0  0  0  0  0  0
  -11.2500   35.5440  -26.4350 P   0  0  0  0  0  0  0  0  0  0  0  0
  -10.7820   34.1390  -26.1530 O   0  5  0  0  0  0  0  0  0  0  0  0
  -12.6550   35.2460  -27.1540 O   0  0  0  0  0  0  0  0  0  0  0  0
  -13.5190   36.3060  -27.5670 C   0  0  2  0  0  0  0  0  0  0  0  0
  -14.9380   36.1250  -27.0400 C   0  0  2  0  0  0  0  0  0  0  0  0
  -14.9050   35.7870  -25.6490 O   0  0  0  0  0  0  0  0  0  0  0  0
  -15.4420   34.9070  -27.8090 C   0  0  1  0  0  0  0  0  0  0  0  0
  -16.7510   34.5310  -27.3550 O   0  0  0  0  0  0  0  0  0  0  0  0
  -15.2980   35.1000  -29.3220 C   0  0  2  0  0  0  0  0  0  0  0  0
  -15.6460   33.8700  -30.0130 N   0  3  0  0  0  0  0  0  0  0  0  0
  -13.8780   35.5310  -29.7080 C   0  0  1  0  0  0  0  0  0  0  0  0
  -13.5980   35.6000  -31.2120 C   0  0  0  0  0  0  0  0  0  0  0  0
  -13.4810   36.6850  -28.9450 O   0  0  0  0  0  0  0  0  0  0  0  0
  -11.6530   36.1670  -25.0090 O   0  0  0  0  0  0  0  0  0  0  0  0
  -10.5810   36.9380  -24.0850 P   0  0  0  0  0  0  0  0  0  0  0  0
   -9.2600   36.1800  -24.1260 O   0  0  0  0  0  0  0  0  0  0  0  0
  -11.3360   37.1730  -22.8000 O   0  5  0  0  0  0  0  0  0  0  0  0
  -10.3670   38.3950  -24.7190 O   0  0  0  0  0  0  0  0  0  0  0  0
  -11.5020   39.2260  -24.9770 C   0  0  0  0  0  0  0  0  0  0  0  0
  -11.0750   40.6710  -25.2160 C   0  0  1  0  0  0  0  0  0  0  0  0
  -10.5170   41.2200  -24.0210 O   0  0  0  0  0  0  0  0  0  0  0  0
   -9.9190   40.6770  -26.2160 C   0  0  1  0  0  0  0  0  0  0  0  0
  -10.4450   40.7660  -27.5430 O   0  0  0  0  0  0  0  0  0  0  0  0
   -9.0910   41.8900  -25.8100 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.3240   41.9660  -24.3050 C   0  0  2  0  0  0  0  0  0  0  0  0
   -8.1600   41.4700  -23.5610 N   0  0  0  0  0  0  0  0  0  0  0  0
   -8.1730   40.2310  -23.0310 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.0730   39.7670  -22.3250 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.0110   38.3940  -21.6940 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.0900   42.2640  -23.4520 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.0980   43.4130  -23.9620 O   0  0  0  0  0  0  0  0  0  0  0  0
   -5.9990   41.8580  -22.7760 N   0  0  0  0  0  0  0  0  0  0  0  0
   -5.9930   40.6250  -22.2180 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9790   40.2560  -21.6000 O   0  0  0  0  0  0  0  0  0  0  0  0
  -13.1751   37.1090  -27.0802 H   0  0  0  0  0  0  0  0  0  0  0  0
  -15.5031   36.9319  -27.2121 H   0  0  0  0  0  0  0  0  0  0  0  0
  -14.0646   35.2836  -25.4481 H   0  0  0  0  0  0  0  0  0  0  0  0
  -14.8341   34.1515  -27.5648 H   0  0  0  0  0  0  0  0  0  0  0  0
  -17.4419   35.0114  -27.8952 H   0  0  0  0  0  0  0  0  0  0  0  0
  -15.9344   35.8152  -29.6109 H   0  0  0  0  0  0  0  0  0  0  0  0
  -15.5498   34.0032  -30.9994 H   0  0  0  0  0  0  0  0  0  0  0  0
  -15.0359   33.1360  -29.7147 H   0  0  0  0  0  0  0  0  0  0  0  0
  -13.2906   34.7934  -29.3750 H   0  0  0  0  0  0  0  0  0  0  0  0
  -12.6523   35.8882  -31.3621 H   0  0  0  0  0  0  0  0  0  0  0  0
  -13.7401   34.6978  -31.6193 H   0  0  0  0  0  0  0  0  0  0  0  0
  -14.2187   36.2590  -31.6367 H   0  0  0  0  0  0  0  0  0  0  0  0
  -12.1181   39.1925  -24.1900 H   0  0  0  0  0  0  0  0  0  0  0  0
  -11.9786   38.8869  -25.7881 H   0  0  0  0  0  0  0  0  0  0  0  0
  -11.8403   41.2243  -25.5449 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.3762   39.8429  -26.1176 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.6916   40.7702  -28.2006 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.4157   42.7178  -26.2675 H   0  0  0  0  0  0  0  0  0  0  0  0
   -8.1228   41.7519  -26.0185 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.4786   42.9217  -24.0544 H   0  0  0  0  0  0  0  0  0  0  0  0
   -8.9758   39.6461  -23.1468 H   0  0  0  0  0  0  0  0  0  0  0  0
   -6.1284   38.2758  -21.2390 H   0  0  0  0  0  0  0  0  0  0  0  0
   -7.7468   38.3019  -21.0231 H   0  0  0  0  0  0  0  0  0  0  0  0
   -7.1168   37.6964  -22.4026 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.2045   42.4587  -22.6874 H   0  0  0  0  0  0  0  0  0  0  0  0
  -16.5918   33.6244  -29.8005 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2 15  1  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  5 14  1  0  0  0  0
  6  7  1  0  0  0  0
  6  8  1  0  0  0  0
  8  9  1  0  0  0  0
  8 10  1  0  0  0  0
 10 11  1  0  0  0  0
 10 12  1  0  0  0  0
 12 13  1  0  0  0  0
 12 14  1  0  0  0  0
 15 16  1  0  0  0  0
 16 17  2  0  0  0  0
 16 18  1  0  0  0  0
 16 19  1  0  0  0  0
 19 20  1  0  0  0  0
 20 21  1  0  0  0  0
 21 22  1  0  0  0  0
 21 23  1  0  0  0  0
 22 26  1  0  0  0  0
 23 24  1  0  0  0  0
 23 25  1  0  0  0  0
 25 26  1  0  0  0  0
 26 27  1  0  0  0  0
 27 28  1  0  0  0  0
 27 31  1  0  0  0  0
 28 29  2  0  0  0  0
 29 30  1  0  0  0  0
 29 34  1  0  0  0  0
 31 32  2  0  0  0  0
 31 33  1  0  0  0  0
 33 34  1  0  0  0  0
 34 35  2  0  0  0  0
 33 60  1  0  0  0  0
  5 36  1  0  0  0  0
  6 37  1  0  0  0  0
  7 38  1  0  0  0  0
  8 39  1  0  0  0  0
  9 40  1  0  0  0  0
 10 41  1  0  0  0  0
 11 42  1  0  0  0  0
 11 43  1  0  0  0  0
 12 44  1  0  0  0  0
 13 45  1  0  0  0  0
 13 46  1  0  0  0  0
 13 47  1  0  0  0  0
 11 61  1  0  0  0  0
 20 48  1  0  0  0  0
 20 49  1  0  0  0  0
 21 50  1  0  0  0  0
 23 51  1  0  0  0  0
 24 52  1  0  0  0  0
 25 53  1  0  0  0  0
 25 54  1  0  0  0  0
 26 55  1  0  0  0  0
 28 56  1  0  0  0  0
 30 57  1  0  0  0  0
 30 58  1  0  0  0  0
 30 59  1  0  0  0  0
M  CHG  3   3  -1  11   1  18  -1
M  END

$$$$
jhjensen2 commented 5 years ago

I've made a new version that is much faster. It works for your example as long as you specify the correct charge. Let me know if you find more problems.

n-yoshikawa commented 5 years ago

It worked! Thank you for your modification.