sunilhoho / EVEREST

Official Pytorch implementation of EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [ICML2024].
https://arxiv.org/abs/2211.10636
21 stars 1 forks source link

Kinetics Self-supervised Checkpoint #2

Open fmthoker opened 3 months ago

fmthoker commented 3 months ago

Dear Authors, We are conducting a study to evaluate Video Self-Supervised models holistically and would like to include your EVEREST model too. Can you please share the Kinetics-400 pretrained VIT-B checkpoint for our evaluation?

sunilhoho commented 3 months ago

Thanks for your interest in our work. You can download pre-trained and fine-tuned checkpoints from the link below. We will update the code and release the other checkpoints soon! :)

https://drive.google.com/drive/folders/1Lltf4m4YjUZwEVYfAVhRRTGoqJHd1Lpp?usp=drive_link

fmthoker commented 3 months ago

@sunilhoho Thanks for sharing the models at such quick notice, do you any checkpoints for Kinetics-400 VIT-B which was trained for more than 200 epochs, we want to use the best model possible for EVEREST.

Also, I started a Kinetics-400 pretraining using the code you have already shared, i want to train your model for 800 epochs:

Here is my script and hyper parameters that i am using: Can you confirm that you used the same settings: i am using 2 nodes with 4 gpus to match your 8 GPU setup.

JOB_NAME=$1 GPUS=${GPUS:-8} GPUS_PER_NODE=${GPUS_PER_NODE:-4} CPUS_PER_TASK=${CPUS_PER_TASK:-8} SRUN_ARGS=${SRUN_ARGS:-""} PY_ARGS=${@:2}

srun -p --job-name=${JOB_NAME} \ --gres=gpu:${GPUS_PER_NODE} \ --ntasks=${GPUS} \ --ntasks-per-node=${GPUS_PER_NODE} \ --cpus-per-task=${CPUS_PER_TASK} \ --kill-on-bad-exit=1 \ ${SRUN_ARGS} \ python -u run_ms_pretraining.py \ --data_path ${DATA_PATH} \ --mask_type motion-centric \ --motion_centric_masking_ratio 0.7 \ --mask_ratio 0.9 \ --model pretrain_videoms_base_patch16_224 \ --decoder_depth 4 \ --lr 3e-4 \ --batch_size 128 \ --num_frames 16 \ --sampling_rate 4 \ --opt adamw \ --opt_betas 0.9 0.95 \ --warmup_epochs 40 \ --epochs 801 \ --save_ckpt_freq 2 \ --log_dir ${OUTPUT_DIR} \ --output_dir ${OUTPUT_DIR}

sunilhoho commented 3 months ago

We do not have a checkpoint for the Kinetics-400 VIT-B model trained for more than 200 epochs.

I reviewed your script and hyperparameters, and they look correct. Your settings seem aligned with our original configuration except for training epochs and the number of GPUs per node. Feel free to reach out if you encounter any issues or have further questions.