Reproducing results - Githubissues

DaniilBoiko commented 8 months ago

Hi,

I'm trying to reproduce the paper, but I'm encountering a few problems. Could you assist with the following queries:

What command was used to initiate the training? I have the following one so far: So far I have the following:

python training/main.py \
--image-path  "/home/ubuntu/images" \ # your extracted preprocessed images from ftp server
--train-index "datasplits/datasplit1-train.csv" \
--data-mols "datasplits/morgan_chiral_fps.hdf5" \ # prepared using molecule preprocessing script from combined train/test/val split files
--val-index "datasplits/datasplit1-val.csv" \ # commented out for now
--csv-separator "," \
--logs "logs" \
--name "test_run_removed_parameter_2s_longer_t" \
--workers 32 \
--batch-size 256 \
--epochs 70 \
--save-frequency 1 \
--save-most-recent \
--model "RN50" \
--method "cloob" \
--scale-hopfield 30 \
--report-to tensorboard \
--seed 1234

Could you provide some information about the necessary hardware – how many GPUs were used and how long the training took?
Would it be possible to provide some training logs or just loss values?

Thank you!

anasf97 commented 8 months ago

Hi,

Sure, here are my answers:

This is the command to run the code. We used different hyperparameters for different tasks but this is an example for running the code with CLIP settings (with learnable inverse tau and without hopfield network layer), which is the one that we used to get the retrieval task results.

python -u src/training/main.py \
--train-index "<path/to/your/train/index.csv>"
--val-index "<path/to/your/val/index.csv>"  \
--image-path "<path/to/your/images/>" \
--data-mols "<path/to/your/molecules/>" \
--image-resolution-train 520 --image-resolution-val [520, 696] --preprocess-img crop \
--batch-size 32  --batch-size-eval 32 --lr 1e-3  --wd 0.1  --lr-scheduler "cosine-restarts"  --restart-cycles 10  --epochs 70 \
--method "clip"  --init-inv-tau 14.3 --learnable-inv-tau True --warmup 20000 --workers 8  --model "RN50" --dist-url "tcp://127.0.0.1:6100" --normalize "dataset"

I used 4 GPUs with about ~12GB. It took around 7 days to complete 70 epochs.
This is how the loss looked like for training with CLIP settings.

Hope this helped!

Best, Ana

DaniilBoiko commented 8 months ago

This is really helpful, thank you. I have a small follow up question: what's the difference between the three data splits at https://ml.jku.at/software/cellpainting/dataset/?

DaniilBoiko commented 6 months ago

Hi, do you have a validation loss curve as well by any chance? Thanks.

q8888620002 commented 1 month ago

Hi,

Sure, here are my answers:

This is the command to run the code. We used different hyperparameters for different tasks but this is an example for running the code with CLIP settings (with learnable inverse tau and without hopfield network layer), which is the one that we used to get the retrieval task results.
python -u src/training/main.py \
--train-index "<path/to/your/train/index.csv>"
--val-index "<path/to/your/val/index.csv>"  \
--image-path "<path/to/your/images/>" \
--data-mols "<path/to/your/molecules/>" \
--image-resolution-train 520 --image-resolution-val [520, 696] --preprocess-img crop \
--batch-size 32  --batch-size-eval 32 --lr 1e-3  --wd 0.1  --lr-scheduler "cosine-restarts"  --restart-cycles 10  --epochs 70 \
--method "clip"  --init-inv-tau 14.3 --learnable-inv-tau True --warmup 20000 --workers 8  --model "RN50" --dist-url "tcp://127.0.0.1:6100" --normalize "dataset"  
I used 4 GPUs with about ~12GB. It took around 7 days to complete 70 epochs.

This is how the loss looked like for training with CLIP settings.

Hope this helped!

Best, Ana

Thank you for sharing the training parameters; they were quite helpful. I've successfully executed your code using the provided parameters. However, when using the same parameters for retrieval, the retrieval performance for image-to-text and text-to-image tasks achieved 0.4%, 2%, and 3.6%, for R@1, R@5, and R@10 respectively. It seems much worse compared to the results presented in the paper. Could you share the command line used for the retrieval tasks so that I can reproduce the results?

Additionally, in the example you shared, do you observe any performance improvements even after the loss has plateaued?

Best

anasf97 commented 1 month ago

Hi,

In this notebook you can find every step that was followed to get the results presented in the paper. It also includes a code snippet to download the models from huggingface so you can compare. Also, keep in mind that the results presented in the paper were produced with a test set with one image per molecule, that can be found here https://huggingface.co/anasanchezf/cloome/blob/main/cellpainting-split-test-imgpermol.csv.

Were your results obtained using the stats shown in the git repo or the corrected ones?

Additionally, in the example you shared, do you observe any performance improvements even after the loss has plateaued?

We selected models based on performance on a validation set, so retrieval metrics kept improving after loss plateauing.

Please let me know if you are still having difficulties to get the same results after taking into account the info above and I will look further into it.

Best, Ana

q8888620002 commented 1 month ago

Thanks for sharing the testing set. Below are my results using the notebook with the testing set you provided and the pre-trained checkpoint from Hugging Face (HF):

{'image_to_text_mean_rank': 701.775413711584, 'image_to_text_median_rank': 576.0, 'image_to_text_R@1': 0.015130023640661938, 'image_to_text_R@5': 0.051536643026004726, 'image_to_text_R@10': 0.06950354609929078, 'text_to_image_mean_rank': 683.1990543735225, 'text_to_image_median_rank': 546.0, 'text_to_image_R@1': 0.022222222222222223, 'text_to_image_R@5': 0.05390070921985816, 'text_to_image_R@10': 0.07470449172576832}

I also found that using the new normalization mean and std (#7) seems to worsen the results .

{'image_to_text_mean_rank': 954.795744680851, 'image_to_text_median_rank': 917.0, 'image_to_text_R@1': 0.002364066193853428, 'image_to_text_R@5': 0.009929078014184398, 'image_to_text_R@10': 0.014657210401891253, 'text_to_image_mean_rank': 936.7598108747045, 'text_to_image_median_rank': 904.0, 'text_to_image_R@1': 0.0028368794326241137, 'text_to_image_R@5': 0.008037825059101654, 'text_to_image_R@10': 0.015130023640661938}

======================================================================================

I've successfully executed your code using the provided parameters. However, when using the same parameters for retrieval, the retrieval performance for image-to-text and text-to-image tasks achieved 0.4%, 2%, and 3.6%, for R@1, R@5, and R@10 respectively.

To clarify, the above comments/results are from retraining CLOOME on my local machine with the training command line provided above using the loss type as CLOOB, not from the pre-trained CLOOME model on HF. Given that the performance seems different, I am wondering if you could share the training command line used for training the pre-trained model for retrieval tasks so I can better reproduce your results.

Additionally, I noticed that while training with CLOOB loss, the inv_tau continuously increases, eventually causing the loss to become negative. Based on the CLOOB paper, I am wondering whether inv_tau should be a fixed value.

Below are my results when training cloome with cloob loss and training cmd

python -u src/training/main.py --train-index path-to-img-datasplit1-train --val-index path-to-img-datasplit1-val --image-path path-to-img-datasplit1 --data-mols mole_files.hd5 --image-resolution-train 520 --image-resolution-val 520 696 --preprocess-img crop --batch-size 32 --batch-size-eval 32 --lr 1e-3 --wd 0.1 --lr-scheduler cosine-restarts --restart-cycles 10 --epochs 70 --method cloob --init-inv-tau 14.3 --learnable-inv-tau --warmup 20000 --model "RN50" --normalize dataset results

anasf97 commented 1 month ago

Hi,

Thank you for your message and for sharing your results. I realized that the pre-trained model in HuggingFace called retrieval was one trained with 1024-bit ECFP fingerprints, when the final molecular encoded that yielded results reported in the paper for this task was a max-pooling combination of Morgan and RDKit count-based fingerprints, with a final length of 8192 bits.

I uploaded the necessary data to reproduce the retrieval results here https://huggingface.co/anasanchezf/cloome/tree/main/retrieval_files%20.

With these files, you should get the results reported in the paper. Please give it a try and let me know if you have any problems.

So, for reproducing the training you should also convert your molecules to this type of fingerprint. You can do it installing this package: pip install git+https://github.com/ml-jku/mhn-react And you should run:

from mhnreact.molutils import convert_smiles_to_fp
fp_ar = convert_smiles_to_fp(list_of_smiles, which='morganc+rdkc', fp_size=8192)

And then you should also adjust the config file for the new input size of the molecules. You can find this config in the huggingface repo linked above.

Also keep in mind that for this run the model was trained in 8 GPUs so the final batch size was 256. So the command would look like this:

python -u src/training/main.py --train-index path-to-img-datasplit1-train --val-index path-to-img-datasplit1-val --image-path path-to-img-datasplit1 --data-mols mole_files.hd5 --image-resolution-train 520 --image-resolution-val 520 696 --preprocess-img crop --batch-size 32 --batch-size-eval 32 --lr 1e-3 --wd 0.1 --lr-scheduler cosine-restarts --restart-cycles 10 --epochs 70 --method clip --init-inv-tau 14.3 --learnable-inv-tau --warmup 20000 --model "RN502" --normalize dataset

You can see that in this case the method used was clip, so that is why the inverse tau is set to be learnable, when training cloob the inverse tau was fixed, as you mention.

Hope this helps!

Best, Ana

q8888620002 commented 1 month ago

Thanks!! I will give ti a try.

ml-jku / cloome

Reproducing results #4