Ubuntu 18.04.3 LTS
CPU: AMD EPYC 7543 32-Core Processor
GPU: 6 * NVIDIA A40 PCIe, Memory: 48G
Python: 3.8
Pytorch: 1.9.0+cu111
Download all data from the data source below:
Run Get_Data.ipynb to create a csv file to corresponds to images for each dataset.
Run Data_preprocessing.ipynb to filter out classes with less than 3 images, and resize all images to 224.
Run Data_Merge.ipynb to merge all the csvs, and do sampling and resamping. Will get final_data_224_sample_balance.csv.
Stratified Kfold.
import pandas as pd
from sklearn.model_selection import StratifiedKFold
df = pd.read_csv('autodl-tmp/final_data_224_sample_balance.csv')
df['fold'] = -1
split = list(StratifiedKFold(n_splits=20, shuffle=True, random_state=999).split(df, df['new_labels']))
for fold, (train_idx, valid_idx) in enumerate(split):
df.loc[valid_idx, 'fold'] = fold
df.to_csv('autodl-tmp/final_data_224_sample_balance_fold.csv', index=False)
df.head(5)
Pre-trained ViT-H-14 from open_clip
Get the visual module:
import open_clip
import torch
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='laion2b_s32b_b79k', cache_dir='./pretrained_models')
model_visual = model.visual
torch.save(model_visual.state_dict(), './pretrained_models/ViT_H_14_2B_vision_model.pt')
All configurations for ViT-H-14-Visual can be found in ./GUIE/config_clip_224.py
Training:
!CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
python -m torch.distributed.launch --nproc_per_node=6 \
./GUIE/train.py \
--csv-dir ./final_data_224_sample_balance_fold.csv \
--config-name 'vit_224' \
--image-size 224 \
--batch-size 32 \
--num-workers 10 \
--init-lr 1e-4 \
--n-epochs 10 \
--cpkt_epoch 10 \
--n_batch_log 300 \
--warm_up_epochs 1 \
--fold 1
Email: 3579628328@qq.com