microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.69k stars 230 forks source link

How does the accuracy of your distillation TinyCLIP-ViT-39M-16-Text-19M on the yfcc15m dataset change with epoch. hope to get your advice. thanks !!! #250

Closed leo23ui closed 1 week ago

leo23ui commented 1 week ago

I see the accuracy of your one epoch distillation TinyCLIP-ViT-39M-16-Text-19M on the laion400m dataset is 54%.

I distill clip on yfcc dataset of 5m image-text pairs, training one epoch of accuracy only 0.0011, and the gpu memory is only 15728MiB and the volatile The gpu util usage varied from 0% to 100%. i think the low accuracy maybe because 5m image-text pairs is small, Could you please tell me the accuracy of one epoch when training on YFCC15M? If possible, could you also provide a curve showing how the accuracy changes over the epochs? Thank you!

export NNODES=1 export GPUS_PER_NODE=1 export WANDB__SERVICE_WAIT=60

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES" torchrun $DISTRIBUTED_ARGS src/training/main.py \ --save-frequency 1 \ --report-to wandb \ --train-data /home/gg/gg/MQBench-main/test/model/e1/split_2tar \ --dataset-type webdataset \ --imagenet-val ./ImageNet \ --warmup 2000 \ --batch-size 1024 \ --epochs 25 \ --workers 1 \ --model TinyCLIP-ViT-39M-16-Text-19M \ --name exp_name \ --seed 0 \ --local-loss \ --grad-checkpointing \ --output ./outputs/TinyCLIP-ViT-39M-16-Text-19M \ --lr 0.0001 \ --gather-with-grad \ --pretrained-image-file ViT-B-16@openai \ --pretrained-text-file ViT-B-16@openai \ --distillation-teacher ViT-B-32@laion2b_e16 \ --norm_gradient_clip 5 \ --train-num-samples 5000000 \ --logit-scale 50

wkcn commented 1 week ago

Hi @leo23ui ,

The model TinyCLIP-ViT-39M-16-Text-19M was trained with knowledge distillation on YFCC-15M for 25 epochs.

Here are the values of validation accuracy (%) on ImageNet-1k. [15.4, 36.0, 48.2, 51.4, 53.6, 55.4, 56.3, 57.2, 58.0, 58.4, 58.8, 59.5, 60.0, 60.6, 61.0, 61.5, 61.8, 62.2, 62.8, 62.8, 63.2, 63.2, 63.4, 63.4, 63.5]

Image

leo23ui commented 1 week ago

Hi @leo23ui ,

The model TinyCLIP-ViT-39M-16-Text-19M was trained with knowledge distillation on YFCC-15M for 25 epochs.

Here are the values of validation accuracy (%) on ImageNet-1k. [15.4, 36.0, 48.2, 51.4, 53.6, 55.4, 56.3, 57.2, 58.0, 58.4, 58.8, 59.5, 60.0, 60.6, 61.0, 61.5, 61.8, 62.2, 62.8, 62.8, 63.2, 63.2, 63.4, 63.4, 63.5]

Image

thanks for your reply!!