hspark1212 / MOFTransformer

Universal Transfer Learning in Porous Materials, including MOFs.
https://hspark1212.github.io/MOFTransformer/
86 stars 13 forks source link

Trouble with test/train fraction in prepare.py and using predict.py #147

Closed kaljamal closed 1 year ago

kaljamal commented 1 year ago

Hi, I wanted to follow up on some issues I have been having on trying to predict bandgap values using predict.py. I placed the cifs i wanted to predict bandgap value in root dir and ran prepare_data with test fraction=1 and train fraction=0. Afterwards I try running predict function using the fine tuned band gap model with appropriate mean and std as well as split = 'test'. However the job fails with the following output: predict_output.pdf Do you know of how to fix this issue? It seems to be wanting to read a train_downstream.json file even though it is empty and despite specifying split 'test'. I want to use fine tuned models and the predict function to obtain band gap values of specific structures I want to investigate.

I tried changing the test fraction and train fraction to 0.8 and 0.2 respectively, for the prepare function and then try the predict function and while the job does complete, the samples are split into train and val sets as well and some cifs will not be predicted as a result. Also the cifs which are predicted into test_results.csv yield mismatching results. My understanding is that the predictions are saved within test_prediction.csv where the predicted values are under label "regression logits", is this correct? This is a test_prediction results containing some cifs I wanted to predict band gap value however the predicted value ( "regression logits"?) is nowhere near the expected literature value. test_prediction.pdf Please let me know how to resolve this issue, as well as if my understanding of how to obtain the prediction results and the process overall is correct? Thank you very much, I appreciate all your help and explanation.

Yeonghun1675 commented 1 year ago

Hi @kaljamal,

First of all, we have checked the file you sent us. We have confirmed that this issue is caused by the setup in datamodule and we have fixed it. We will update this section later today or tomorrow.

For now, please make sure that you set the mean and std (2.086, 1.131) that we provided.

As you mentioned, you can check the regression logits. The bandgap model we uploaded is trained based on QMOF's bandgap (PBE level). These are calculated with a low-level function and may be somewhat different from the values measured in experiments.

Please let me know if the reference you are referring to is a PBE calculated value rather than an experimental value. I'd be happy to check it out.

Also, since the structure is specific to MOFs, it may not fit well to non-MOF materials, especially if they have atoms that are not commonly used in MOFs or if the values in the output are out of the distribution.

kaljamal commented 1 year ago

Hi, thank you very much for your response and solution. I am able to run the predict function without any issues now. As for the reference I now recognize that I was comparing to a band gap determined from a higher level theory rather than PBE, which is what the band gap model is trained on. When rerunning my same reference structure with PBE functional I obtained a much closer value as predicted from the bandgap model. Thank you for all your help!