shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
https://arxiv.org/abs/2312.02153
Apache License 2.0
459 stars 28 forks source link

Inference issues #36

Open kavithar0608 opened 2 months ago

kavithar0608 commented 2 months ago

I ran the demo_lazy.py as shown below (on google colab) -

!cd /content/APE && python3 demo/demo_lazy.py \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py \
--input image1.jpg \
--output /content/APE/APE_Output/ \
--confidence-threshold 0.1 \
--text-prompt 'person' \
--with-box \
--with-mask \
--with-sseg \
--opts \
train.init_checkpoint=/content/APE_Models/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_20230829_162438/model_final.pth \
model.model_language.cache_dir="" \
model.model_vision.select_box_nums_for_evaluation=500 \
model.model_vision.text_feature_bank_reset=True 

But the execution abruptly stops after some point (I see a ^C character generated automatically here in the output) - i dont see any output image generated either.

How can i resolve this issue?

Output of the execution given below -

[04/15 17:52:57 detectron2]: Arguments: Namespace(config_file='configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py', webcam=False, video_input=None, input=['image1.jpg'], output='/content/APE/APE_Output/', confidence_threshold=0.1, opts=['train.init_checkpoint=/content/APE_Models/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_20230829_162438/model_final.pth', 'model.model_language.cache_dir=', 'model.model_vision.select_box_nums_for_evaluation=500', 'model.model_vision.text_feature_bank_reset=True'], text_prompt='person', with_box=True, with_mask=True, with_sseg=True)
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
apex.normalization.FusedLayerNorm not found, will use pytorch implementations
Please 'pip install xformers'
apex.normalization.FusedLayerNorm not found, will use pytorch implementations
======== shape of rope freq torch.Size([1024, 64]) ========
======== shape of rope freq torch.Size([4096, 64]) ========
[04/15 17:53:06 ape.data.detection_utils]: Using builtin metadata 'image_count' for dataset '['lvis_v1_train+coco_panoptic_separated']'
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([1203]) num_classes: 1256
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: pad fed_loss_cls_weights with type cat and value 0
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: pad fed_loss_classes with tensor([1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214,
        1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226,
        1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238,
        1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250,
        1251, 1252, 1253, 1254, 1255])
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: tensor([ 1.0000,  1.0000,  3.1623,  7.3485, 43.8520, 25.0998,  5.5678,  8.3066,
         2.6458,  3.3166,  1.0000,  5.4772,  7.0711,  6.7082,  5.2915, 10.6771,
        13.8924,  4.5826,  9.5394,  5.5678, 38.3275, 43.8634,  9.3274,  8.7750,
         3.3166,  6.8557,  4.5826,  6.8557,  8.3666, 42.8719,  4.3589, 23.0434,
         3.3166, 46.6798, 10.6301,  5.0990,  2.2361,  7.4833,  8.5440,  5.6569,
        11.3137, 24.9600,  3.4641,  7.2111,  3.3166, 41.0731,  9.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
         0.0000,  0.0000,  0.0000,  0.0000])
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([1256]) num_classes: 1256
[04/15 17:53:06 ape.data.detection_utils]: Using builtin metadata 'image_count' for dataset '['openimages_v6_train_bbox_nogroup']'
[04/15 17:53:06 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([601]) num_classes: 601
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 0
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: lvis_v1_train+coco_panoptic_separated
......
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 4
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: sa1b_6m
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: ['object']
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 5
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: refcoco-mixed_group-by-image
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: ['object']
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 6
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: gqa_region_train
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 7
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: phrasecut_train
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 8
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: flickr30k_separateGT_train
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_id: 9
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_name: refcoco-mixed
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: thing_classes: ['object']
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: stuff_classes: None
[04/15 17:53:06 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
^C
shenyunhang commented 2 months ago

I think it may be out of memory.