Do you have a PyTorch v2 training script with multi-GPU support?

ml-jku / clamp

Code for the paper Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

https://arxiv.org/abs/2303.03363

Other

91 stars 6 forks source link

Do you have a PyTorch v2 training script with multi-GPU support? #6

Closed linhduongtuan closed 1 year ago

linhduongtuan commented 1 year ago

Hi Phillip,

Do you have a PyTorch v2 training script with multi-GPU support? If so, would you be able to share it with me?

As far as I know, your Arxiv paper states that the total compute runtime was around 170 days and 800 runs (without linear probing). I am wondering why you used only one GPU to train the huge model.

Have a nice weekend. Linh

phseidl commented 1 year ago

Hi Linh, not as of now, but feel free to add a PR for the support. This shouldn't take too long. There was no need to add multi-GPU support for my setup. Training pubchem23 runs in under 2 days on a sing A100.

The models aren't too large, since only an "adapter" is trained; the large models are used as frozen encoders only once to embed them.

Best, Philipp

phseidl commented 1 year ago

Just checked; torch '2.0.1+cu117' works fine for me for training

linhduongtuan commented 1 year ago

Hi @phseidl Philipp, Thank you for answering my question. It's good to know you used an adapter for downstream finetuning on the PubChem23 dataset.

Since I am struggling to preprocess the PubChem23 dataset, would you be able to share your preprocessed PubChem23 data with me?

I have a follow-up question - do you plan to make a PR to the Huggingface Hub? It would be great if you could push your results and source code there. Sometimes I encounter errors with the mlflow package. In my opinion, using wandb to monitor everything would be fine.

Best, Linh

phseidl commented 1 year ago

Hi @linhduongtuan , sorry for the late response; I have added a reproducible way to download the pubchem23 dataset.

wget -N -r https://cloud.ml.jku.at/s/fi83oGMN2KTbsNQ/download -O pubchem23.zip
unzip pubchem23.zip
rm pubchem23.zip

(added it to ./data/pubchem.md)

Completely agree. Working on a new project so far, so it's not a priority, but would like to add the models to the HF-Hub.

Best, Philipp