rainbow-xiao / ECCV2022-ILR-workshop

2nd place solution to Google Universal Image Embedding Challenge!
MIT License
42 stars 7 forks source link

Google Universal Image Embedding Challenge 2022

2nd Place Solution

Paper

Competition on kaggle

ECCV 2022 Instance-Level Recognition workshop

Kaggle profile

Inference notebook

HARDWARE & SOFTWARE

Ubuntu 18.04.3 LTS

CPU: AMD EPYC 7543 32-Core Processor

GPU: 6 * NVIDIA A40 PCIe, Memory: 48G

Python: 3.8

Pytorch: 1.9.0+cu111

Data Preparation

  1. Download all data from the data source below:

    Aliproducts

    Art_MET

    DeepFashion(Consumer-to-shop)

    DeepFashion2(hard-triplets)

    Fashion200K

    ICCV 2021 LargeFineFoodAI

    Food Recognition 2022

    JD_Products_10K

    Landmark2021

    Grocery Store

    rp2k

    Shopee

    Stanford_Cars

    Stanford_Products

  2. Run Get_Data.ipynb to create a csv file to corresponds to images for each dataset.

  3. Run Data_preprocessing.ipynb to filter out classes with less than 3 images, and resize all images to 224.

  4. Run Data_Merge.ipynb to merge all the csvs, and do sampling and resamping. Will get final_data_224_sample_balance.csv.

  5. Stratified Kfold.

    import pandas as pd
    from sklearn.model_selection import StratifiedKFold
    df = pd.read_csv('autodl-tmp/final_data_224_sample_balance.csv')
    df['fold'] = -1
    split = list(StratifiedKFold(n_splits=20, shuffle=True, random_state=999).split(df, df['new_labels']))
    for fold, (train_idx, valid_idx) in enumerate(split):
    df.loc[valid_idx, 'fold'] = fold
    df.to_csv('autodl-tmp/final_data_224_sample_balance_fold.csv', index=False)
    df.head(5)

Model Preparation

  1. Pre-trained ViT-H-14 from open_clip

  2. Get the visual module:

    import open_clip
    import torch
    model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='laion2b_s32b_b79k', cache_dir='./pretrained_models')
    model_visual = model.visual
    torch.save(model_visual.state_dict(), './pretrained_models/ViT_H_14_2B_vision_model.pt')

Training

  1. All configurations for ViT-H-14-Visual can be found in ./GUIE/config_clip_224.py

  2. Training:

    !CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
    python -m torch.distributed.launch --nproc_per_node=6 \
    ./GUIE/train.py \
    --csv-dir ./final_data_224_sample_balance_fold.csv \
    --config-name 'vit_224' \
    --image-size 224 \
    --batch-size 32 \
    --num-workers 10 \
    --init-lr 1e-4 \
    --n-epochs 10 \
    --cpkt_epoch 10 \
    --n_batch_log 300 \
    --warm_up_epochs 1 \
    --fold 1

Contact

Email: 3579628328@qq.com