materialsvirtuallab / megnet

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
BSD 3-Clause "New" or "Revised" License
502 stars 156 forks source link

Problem with load molecule data from ase #354

Open dimka11 opened 2 years ago

dimka11 commented 2 years ago

I don't understand how I can load data from ase format. I look this tutorial https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/molecule_example.ipynb and have tried to convert the data to xyz files, but this files loaded by pybel but can't be load to the model.

chc273 commented 2 years ago

I am not sure if I get your question. You are saying the example does not work even if you converted the ase Atoms to xyz file?

dimka11 commented 2 years ago

@chc273 Thanks for response! That's is a example of my xyz file:

34
Properties=species:S:1:pos:R:3 pbc="F F F"
C       23.94271088      -4.14493513      -2.98162127
C       24.55592728      -0.82619798       1.23874521
O       20.93027115       2.65132999       1.20267034
C       16.11702538       1.21504414       1.46484005
O       15.08468533      -3.13689113       1.72822750
N       12.34882450       4.55354691       1.44151032
C        7.51371670       2.86691523       1.71749294
N        5.92233944      -1.59980488       2.00862408
N        1.48521304      -2.39037442       2.22327352
C       -1.57001507       1.23494565       2.15761590
C       -6.86996460       0.83962160       2.38604617
C       -8.86610794       0.44161573      -2.65364766
C      -14.11276245       0.06187227      -2.19375157
C      -16.35991859      -4.27609301      -1.76654506
C      -21.28580284      -4.39460611      -1.34774482
C      -23.98753166      -0.24840684      -1.35021245
C      -21.73451805       4.07528114      -1.77712965
C      -16.83377075       4.22927999      -2.19576573
S        2.25029945       5.97498083       1.76499522
H       24.44957352      -7.96948814      -1.98474431
H       20.41406441      -3.66614795      -4.67646313
H       26.86673546      -3.41824102      -5.65698814
H       24.43783188      -2.88718438       4.61809397
H       28.05641365       1.06847334       0.92216319
H       12.93233967       8.17394257       1.23317087
H       -7.73730135      -2.48141146       4.22825480
H       -8.44302177       3.79710603       4.37515783
H       -8.03818798       3.85060787      -4.53595924
H       -7.34933376      -2.67482758      -4.47931862
H      -14.32896519      -7.55946207      -1.75550330
H      -23.16374397      -7.72327423      -1.00702596
H      -27.87986374      -0.56414610      -1.00707245
H      -23.81028938       7.30708075      -1.78150427
H      -14.93687820       7.61984539      -2.54143047

After loaded by pybel it's look incorrectly compared with moleculus from molecules.json, instead the structure it's show only C .. O . C..

(pybel doesn't molecule structure )

And after training of megnet model start I get error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    293             try:
--> 294                 graph = self.graph_converter.convert(s)
    295                 graphs_valid.append(graph)

8 frames
ValueError: max() arg is an empty sequence

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<decorator-gen-53> in time(self, line, cell, local_ns)

<timed eval> in <module>()

/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    299                     warn(f"structure with index {i} failed the graph computations", UserWarning)
    300                     continue
--> 301                 raise RuntimeError(str(e))
    302         return graphs_valid, targets_valid
    303 

Colab notebook: https://colab.research.google.com/drive/16MXFzX8dtmt4LHzEAOV2ctAohVfeBcP2?usp=sharing

and few xyz examples: https://github.com/dimka11/mol_data

I participate in some competition and task is predict energy for molecule

I would be grateful for any information.

chc273 commented 2 years ago

I see where the problem is. In the molecule you showed, there is no chemical bond per pybel's definition. (the error message should have been better).

In any case, the MolecularGraph is not well supported and is only limited to using the QM9 molecules with elements like "H", "C", "N", "O", "F".

Please consider using alternative methods like this one instead https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/qm9_simple_model.ipynb

dimka11 commented 2 years ago

@chc273 Thank you. Model works now. I want to know, CrystalGraph supports only pymatgen structure, not openbabel? Where can I find out more information about tuning hyperparameters? I trained model with 130k molecule examples and 300 epoch. it was 6.5 hour only for training on P100. Is it reasonable? Should I try to continue training with more numbers of epoch for increase accuracy or would I have to do something else?