paulhager / MMCL-Tabular-Imaging

82 stars 13 forks source link

Training Details #9

Closed MrChill closed 10 months ago

MrChill commented 10 months ago

Hi Paul,

I have seen that your paper is quite similar to our work "Learning visual models using a knowledge graph as a trainer" published at ISWC21, where we learn image-based classifier with tabular data in form of a knowledge graph (KG). We also investigated the importance of context (in the KG) in https://arxiv.org/abs/2210.11233, also published at ISWC22.

I am quite interested to push research further, also to other domains.

Therefore, can you please provide some information about the training details? (Number of GPUs, training time)

I am trying to reproduce the results with the given config files and an Nvidia-V100, however I can not achieve the same results. The images are loaded into a single tensor and live_loading= False.

with DVM Dataset: pretrain=False, datatype=tabular --> best.val.acc = 70.86% (31m 33s) pretrain=False, datatype=imaging --> best.val.acc = 88.78% (2d 5h 33m) pretrain=False, datatype=multimodal --> best.val.acc = 87.43 % (20h 11m 6)

and pretrain=True, datatype=tabular --> best.val.acc = 70.83% (9h 53m 19)

Still running: Epoch = 91 pretrain=True, datatype=multimodal --> best.val.acc = 16.85% (6d 22h 48m)

For tabular data I use (since we don't have access to _'Adtable (extra).csv'): data_train_tabular: dvm_features_train_noOH_all_views.csv

paulhager commented 10 months ago

I am using a single A40 GPU.

Your pretrain=False, datatype=imaging performance looks normal, that's in line with what I got in the paper. It takes quite a while on your GPU though, for me it trains in about 2 hours.

Your tabular results are very low though, I get top-1 accuracies in the low 90s for pretrain=False, datatype=Tabular when training with the physical data.

As explained in the README, you can get access to Ad_table (extra).csv here

It is essential to have these physical features in the tabular data as they are very informative and especially important for the multimodal contrastive learning process as shown in the paper. I just regenerated the tabular data using the notebook I uploaded in the GitHub and trained using a clean GitHub clone and was able to reproduce my results. It also only took 7 minutes to train. I used a learning rate of 0.0001.

The multimodal training is definitely longer, but still only a 1 day and 9 hours, not 6 days. Before even doing a learning rate search, at the end of multimodal training my linear evaluation reached a validation accuracy of 0.92

MrChill commented 10 months ago

I rerun the experiment with the updated tabular files in multimodal mode. However, in 500 epochs I just reach ~85% val accuracy.

image

Would you mind sharing your .pt files for you tabular data? Or do you think the hyperparameters are wrong?

paulhager commented 10 months ago

In the https://github.com/paulhager/MMCL-Tabular-Imaging/tree/main/data folder you have my exact data splits.

Screenshot 2023-09-18 at 18 07 54

Here is the eval curve after pretraining. Should only take around 50 epochs too.

And heres the WandB config export to the run

multimodal_dvm_run.txt

Import hyperparams would be lr=0.003, temperature=0.1, weight_decay=0.0000015, batch_size=512, augmentation_rate = 0.95, corruption_rate=0.3, crop_scale_lower=0.08, eval_train_augment_rate=0.8