microsoft / GLIP

Grounded Language-Image Pre-training
MIT License
2.22k stars 193 forks source link

where is the fine-tune config file ? #5

Closed Edwardmark closed 2 years ago

Edwardmark commented 2 years ago

hi,could you please tell me where the fine-tune config file is? I want to do prompt tuning. Thanks.

liunian-harold-li commented 2 years ago

Hi the commands for running the prompt tuning is at https://github.com/microsoft/GLIP#prompt-tuning.

Edwardmark commented 2 years ago

Hi the commands for running the prompt tuning is at https://github.com/microsoft/GLIP#prompt-tuning.

image which file is the configs for ft-task? Thanks.

Edwardmark commented 2 years ago

@liunian-harold-li could you use Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml data as an example to show how to do prompt / full fine-tuning, and give the full command with all config and options? It would make things more clear. Thanks.

liunian-harold-li commented 2 years ago

Sorry for the late reply. The {configs} is configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml. The explanation about this config is given at https://github.com/microsoft/GLIP#odinw--custom-dataset-evaluation. The fine-tuning config is the same as the evaluation config. Sorry for the confusion.

Edwardmark commented 2 years ago

@liunian-harold-li Thanks. So is the following command right? python -m torch.distributed.launch --nproc_per_node=4 tools/finetune.py \ --config-file configs/pretrain/glip_Swin_L.yaml --ft-tasks configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml ....

liunian-harold-li commented 2 years ago

Sorry for the confusion. --config-file should be set to the pretrian model's config file. An example command:

python -m torch.distributed.launch --nproc_per_node=4 tools/finetune.py \ --config-file configs/pretrain/glip_Swin_T_O365_GoldG.yaml --ft-tasks configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml --skip-test \ --custom_shot_and_epoch_and_general_copy 3_200_4 \ --evaluate_only_best_on_test --push_both_val_and_test \ MODEL.WEIGHT MODEL/glip_tiny_model_o365_goldg_cc_sbu.pth \ SOLVER.USE_AMP True TEST.DURING_TRAINING True TEST.IMS_PER_BATCH 4 SOLVER.IMS_PER_BATCH 4 SOLVER.WEIGHT_DECAY 0.05 TEST.EVAL_TASK detection DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding MODEL.BACKBONE.FREEZE_CONV_BODY_AT 2 MODEL.DYHEAD.USE_CHECKPOINT True SOLVER.FIND_UNUSED_PARAMETERS False SOLVER.TEST_WITH_INFERENCE True SOLVER.USE_AUTOSTEP True DATASETS.USE_OVERRIDE_CATEGORY True SOLVER.SEED 10 DATASETS.SHUFFLE_SEED 3 DATASETS.USE_CAPTION_PROMPT True DATASETS.DISABLE_SHUFFLE True \ SOLVER.STEP_PATIENCE 3 SOLVER.CHECKPOINT_PER_EPOCH 1.0 SOLVER.AUTO_TERMINATE_PATIENCE 8 SOLVER.MODEL_EMA 0.0 SOLVER.TUNING_HIGHLEVEL_OVERRIDE full

On Thu, 5 May 2022 at 23:29, Edwardmark @.***> wrote:

@liunian-harold-li https://github.com/liunian-harold-li Thanks. So I should set ft-config {configs} as " configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml" the same as --config-file {config_file}? Is that right?

— Reply to this email directly, view it on GitHub https://github.com/microsoft/GLIP/issues/5#issuecomment-1119299458, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAYVCU3F4YH476R4OI672TVIS33JANCNFSM5UOJYLMQ . You are receiving this because you were mentioned.Message ID: @.***>

Edwardmark commented 2 years ago

@liunian-harold-li Thank you very much, your kind reply is really helpful.

Edwardmark commented 2 years ago

@liunian-harold-li Hi, I try to run the finetune.py with the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
        --nproc_per_node=4 tools/finetune.py \
        --config-file configs/pretrain/glip_Swin_L.yaml \
        --ft-tasks configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml \
        --skip-test \
        --custom_shot_and_epoch_and_general_copy 0_200_1 \
        --evaluate_only_best_on_test --push_both_val_and_test \
        MODEL.WEIGHT MODEL/glip_large_model.pth \
        SOLVER.USE_AMP True \
        TEST.DURING_TRAINING True \
        TEST.IMS_PER_BATCH 4 \
        SOLVER.IMS_PER_BATCH 4 \
        TEST.EVAL_TASK detection \
        DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
        MODEL.BACKBONE.FREEZE_CONV_BODY_AT 2 \
        MODEL.DYHEAD.USE_CHECKPOINT True \
        SOLVER.FIND_UNUSED_PARAMETERS False \
        SOLVER.TEST_WITH_INFERENCE True \
        SOLVER.USE_AUTOSTEP True \
        DATASETS.USE_OVERRIDE_CATEGORY True \
        SOLVER.SEED 10 \
        DATASETS.SHUFFLE_SEED 3 \
        DATASETS.USE_CAPTION_PROMPT True \
        DATASETS.DISABLE_SHUFFLE True \
        SOLVER.STEP_PATIENCE 2 \
        SOLVER.CHECKPOINT_PER_EPOCH 1.0 \
        SOLVER.AUTO_TERMINATE_PATIENCE 4 \
        SOLVER.MODEL_EMA 0.0 \
        SOLVER.WEIGHT_DECAY 0.05 \
        SOLVER.TUNING_HIGHLEVEL_OVERRIDE full

it shows the following error: image

Then I debug by run the cmd using pdb with single node,

CUDA_VISIBLE_DEVICES=0 python -m pdb tools/finetune.py \
        --config-file configs/pretrain/glip_Swin_L.yaml \
        --ft-tasks configs/odinw/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml \
        --skip-test \
        --custom_shot_and_epoch_and_general_copy 0_200_1 \
        --evaluate_only_best_on_test --push_both_val_and_test \
        MODEL.WEIGHT MODEL/glip_large_model.pth \
        SOLVER.USE_AMP True \
        TEST.DURING_TRAINING True \
        TEST.IMS_PER_BATCH 1 \
        SOLVER.IMS_PER_BATCH 1 \
        TEST.EVAL_TASK detection \
        DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
        MODEL.BACKBONE.FREEZE_CONV_BODY_AT 2 \
        MODEL.DYHEAD.USE_CHECKPOINT True \
        SOLVER.FIND_UNUSED_PARAMETERS False \
        SOLVER.TEST_WITH_INFERENCE True \
        SOLVER.USE_AUTOSTEP True \
        DATASETS.USE_OVERRIDE_CATEGORY True \
        SOLVER.SEED 10 \
        DATASETS.SHUFFLE_SEED 3 \
        DATASETS.USE_CAPTION_PROMPT True \
        DATASETS.DISABLE_SHUFFLE True \
        SOLVER.STEP_PATIENCE 2 \
        SOLVER.CHECKPOINT_PER_EPOCH 1.0 \
        SOLVER.AUTO_TERMINATE_PATIENCE 4 \
        SOLVER.MODEL_EMA 0.0 \
        SOLVER.WEIGHT_DECAY 0.05 \
        SOLVER.TUNING_HIGHLEVEL_OVERRIDE full \
        DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False

and I found that the error occurs in https://github.com/microsoft/GLIP/blob/main/maskrcnn_benchmark/data/build.py#L443, for dataset does not have attribute datasets, as follows: image

I am wondering,if the code has some bugs, but I cannot find out.

And when add the following line to avoid this attribute assertion, it can run successfully:

DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False

questions:

  1. what does "DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE" mean, and how to set it?
  2. when using default config, the dataset assertion error shows, how to fix it? Could you please check out whether this bug is reproduceble and how to fix it. Thanks.
liunian-harold-li commented 2 years ago

Sorry for the confusion. Indeed, DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE should be set to false.

It was intended to be used when we have a lot of pre-training data (dozens of terabytes) and one training node cannot load all the data. We implemented this to distribute the data to different nodes to avoid running out of hard disk space.

I have removed the option in the GLIP-L config.

Edwardmark commented 2 years ago

@liunian-harold-li Thank you very much.

imaginistLi commented 1 year ago

Excuse me, I want to ask another question: Where to download the PREDEFINED_TEXT 'odinw/pothole/category_description.json' metioned in configs?