Open Gumpest opened 5 months ago
It happened in Cross-Modal distillation process.
@Gumpest I observed --train-data synthetic
in the training command.
Did you replace the dataloader with the one loading LAION-400M image-text pairs?
@wkcn Oh, I didn't do that. The step is not mentioned in the docs. Do you have detailed information.
Sorry for that. Regarding to the data loader, you can refer to the repo OpenCLIP (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data).
@wkcn Sorry to bother you, (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data) tells me how to download the laion-400m dataset, and "replace the dataloader with the one loading LAION-400M image-text pairs" means what😂
@wkcn or please provide the script to train with YFCC.
@Gumpest Sorry for late reply.
@wkcn Sorry to bother you, (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data) tells me how to download the laion-400m dataset, and "replace the dataloader with the one loading LAION-400M image-text pairs" means what😂
In our scripts, --train-data
and --dataset-type
are both synthetic. You need to replace it in order to load the LAION-400M or YFCC-15M datasets.
@wkcn or please provide the script to train with YFCC.
Here are the hyper-parameters on YFCC.
On YFCC-15M, it contains 2 compression stages, where the training epochs are both 25 from 100% to 50% parameters, and 50% to 10%. We follow the hyper-parameter of CLIP except that the learning rate is set to 10^−4 when using weight inheritance.
Fig. 7 in Supplementary Material
Stage 1: CLIP-VIT-16 to TinyCLIP-ViT-39M-16-Text-19M (manual inheritance, 100% to 50%)
export NNODES=1
export GPUS_PER_NODE=8
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
--save-frequency 1 \
--report-to wandb \
--train-data <your_yfcc_path/> \
--dataset-type webdataset \
--imagenet-val ./ImageNet \
--warmup 2000 \
--batch-size 512 \
--epochs 25 \
--workers 8 \
--model TinyCLIP-ViT-39M-16-Text-19M \
--name exp_name \
--seed 0 \
--local-loss \
--grad-checkpointing \
--logs ./outputs/TinyCLIP-ViT-39M-16-Text-19M \
--lr 0.0001 \
--gather-with-grad \
--pretrained-image-file ViT-B-16@openai \
--pretrained-text-file ViT-B-16@openai \
--distillation-teacher ViT-B-32@laion2b_e16 \
--logit-scale 50 \
--norm_gradient_clip 5 \
--train-num-samples 15000000
Stage 2: TinyCLIP-ViT-39M-16-Text-19M to TinyCLIP-ViT-8M-16-Text-3M (manual inheritance, 50% to 10%)
export NNODES=1
export GPUS_PER_NODE=8
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
--save-frequency 1 \
--report-to wandb \
--train-data <your_yfcc_path/> \
--dataset-type webdataset \
--imagenet-val ./ImageNet \
--warmup 2000 \
--batch-size 512 \
--epochs 25 \
--workers 8 \
--model TinyCLIP-ViT-8M-16-Text-3M \
--name exp_name \
--seed 0 \
--local-loss \
--grad-checkpointing \
--logs ./outputs/TinyCLIP-ViT-8M-16-Text-3M \
--lr 0.0001 \
--gather-with-grad \
--pretrained-image-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
--pretrained-text-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
--distillation-teacher ViT-B-32@laion2b_e16 \
--logit-scale 50 \
--norm_gradient_clip 5 \
--train-num-samples 15000000
In my reproduction of
auto_weight_inherit_100to75.sh
, the imagenet-zeroshot-val-top1 is 0.0010 inTrain Epoch: 0 [2501/48828]
. I wonder about the situation weather is normal.