ml-jku / clamp

Code for the paper Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
https://arxiv.org/abs/2303.03363
Other
86 stars 6 forks source link

Pretraining and computational resource? #3

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hi, thank you for sharing your great work! I am interested in the concept of your paper and would like to try pretraining as written in your paper. How can I pretrain using this repository? Another question is about the computational resource. In your paper, it took total 170 days and 800 times. Does the pretraining require the same computational time? Is it possible to pretrain using a single GPU?

Thank you in advance:)

phseidl commented 1 year ago

Hi concon23, pretraining on the full PubChem18 dataset should take around 2-5 days with a modest consumer GPU after preprocessing the data. You can follow the instructions in the reproduce section of the readme. Hope you manage, otherwise I'm happy to help.

ghost commented 1 year ago

Hi @phseidl Thank you for your kind reply. I understand that. It is friendly to users with common computational resources!

Sincerely:)

ghost commented 1 year ago

Hi @phseidl Sorry for asking a question again. python clamp/train.py --dataset=./data/fsmol --assay_mode=clip --split=FSMOL_split The above command runs the pretraining? Or runs a few shot training or something other?

Thank you in advance:)

phseidl commented 1 year ago

Hi @concon23, this performs pretraining and evaluates it on zero-shot. To run few-shot you can add --support_set_size=k where k is the number of support-samples you want. Best, Philipp