nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

Inference #27

Closed nsaadati closed 1 year ago

nsaadati commented 1 year ago

Thank you for your well-written paper. I've been trying to test your model on our dataset but I'm having trouble figuring out how to do it. Could you please help me with this? Also, when I tried to add the "--eval" argument, it still starts training. Can you assist me in resolving this?

ayushjain1144 commented 1 year ago

can you share the exact command you ran? --eval should have worked.

TORCH_DISTRIBUTED_DEBUG=INFO CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --master_port $RANDOM \
    train_dist_mod.py --num_decoder_layers 6 \
    --use_color \
    --weight_decay 0.0005 \
    --data_root DATA_ROOT \
    --val_freq 5 --batch_size 24 --save_freq 5 --print_freq 1000 \
    --lr_backbone=1e-3 --lr=1e-4 \
    --dataset sr3d --test_dataset sr3d \
    --detect_intermediate --joint_det \
    --use_soft_token_loss --use_contrastive_align \
    --log_dir ./logs/bdetr \
    --lr_decay_epochs 25 26 \
    --pp_checkpoint PATH/TO/gf_detector_l6o256.pth \
    --butd --self_attend --augment_det --eval

(by any chance, did you try adding --eval in a new line and forgot "\"?)

nsaadati commented 1 year ago

this the command line that I run sh butd_detr/scripts/train_test_det.sh --eval --checkpoint_path=butd_detr/sr3d_butd_det_52.1_27.pth

ayushjain1144 commented 1 year ago

I think you just need to add these flags inside the .sh script

nsaadati commented 1 year ago

yeah I realize this later. Thank you also how can I do the infrence for just one image and get the result as an image?

ayushjain1144 commented 1 year ago

You can use bdetr2d branch for that

nsaadati commented 1 year ago

Sorry I still can't understand that how can I see the results?

ayushjain1144 commented 1 year ago

Sorry, I thought you wanted to run our model on an RGB Image instead of the pointcloud.

For running it one pointcloud and getting the results, you would need to set this functionality yourself. (you can reference votenet on how to do that)

nsaadati commented 1 year ago

Sorry for confusion, I mean I have some images and some sentences I want to test the model but I still don't know how can I do it.

nsaadati commented 1 year ago

do you have any example for 2d images? like the one that you sent for 3d? this one

ayushjain1144 commented 1 year ago

I think you can use this: https://github.com/nickgkan/butd_detr/blob/bdetr2d/main.py#L513-L517 which calls https://github.com/nickgkan/butd_detr/blob/bdetr2d/visualize_image.py

nsaadati commented 1 year ago

thanks

nsaadati commented 1 year ago

Hi should I just run the main file? also shouldn't we give it a sentence too? why it just take an image?

ayushjain1144 commented 1 year ago

you're right, the sentence is currently hardcoded here but you could modify the function arguments to take the sentence too.

(PS: We haven't used/tested this script for a while so you might need to make some other changes too)

nsaadati commented 1 year ago

I wrote this code for inference, but I'm not sure what I'm doing wrong. When I load the model and try to make predictions for labels, I'm getting no output, which is causing the code to break at line 71 in visualize_image.py (assert len(scores) == len(boxes) == len(labels) == len(masks)). The lengths of masks, boxes, and scores are all 300, but the length of labels is 0.

parser = argparse.ArgumentParser('Deformable DETR training and evaluation script', parents=[get_args_parser()]) args = parser.parse_args() device = torch.device(args.device) model, criterion, weight_dict = \ build_bdetr_model(args) model.to(device) img_path = args.img_path img = Image.open(img_path) visualize_results(model, img)

ayushjain1144 commented 1 year ago

Hi,

I can look into this more, could you share with me the image and caption you are using? The labels are specifically generated through this step which checks the span predicted by our model. labels can be empty when the query does not predict a positive span for any word (which is highly unlikely). Could you maybe check what exactly happens at this step when you run with your image and caption to find why exactly the labels were empty?

Also, it's weird that you still get 300 boxes and scores, because here we threshold it by confidence and its again highly unlikely that all 300 queries are confident.

Also, are you able to run an inference of our model and reproduce the results to ensure your conda environment and checkpoint loading etc. is setup correctly?

Let me know when you have more information and I would be happy to try reproducing this issue further.

nsaadati commented 1 year ago

I'm using one of the training images with 'a woman' caption and with this pretrained model (pretrain_2d.pth) maybe the problem is with the model image

ayushjain1144 commented 1 year ago

I see, are you able to reproduce any of the numbers to be sure everything is setup correctly?

nsaadati commented 1 year ago

which number?

ayushjain1144 commented 1 year ago

okay, I think I know the problem -- that visualize script is not supplying "bottom-up" boxes to the model while the checkpoint is trained with those. so either you could look at engine.py and modify visualize_image.py to also supply ``bottom-up" boxes or maybe wait for a week and I can push a fix. you can obtain bottom up boxes by using the ones we already provide (check Readme). Also, I can provide you with the checkpoint that does not use the box-stream and that checkpoint would work with your current script and input (+ removing --butd flag).

By number, I meant results on either of the datasets -- refcoco/flickr. But mostly the problem is in the script and not your environment.

nsaadati commented 1 year ago

okay thanks for your help I think I can wait for a week, and I appreciate if you could let me know if you were able to fix it or not.


From: Ayush Jain @.> Sent: Monday, June 5, 2023 12:03 PM To: nickgkan/butd_detr @.> Cc: Saadati, Nastaran [M E] @.>; Author @.> Subject: Re: [nickgkan/butd_detr] Inference (Issue #27)

okay, I think I know the problem -- that visualize script is not supplying "bottom-up" boxes to the model while the checkpoint is trained with those. so either you could look at engine.pyhttps://github.com/nickgkan/butd_detr/blob/bdetr2d/engine.py#L74-L75 and modify visualize_image.py to also supply ``bottom-up" boxes or maybe wait for a week and I can push a fix. you can obtain bottom up boxes by using the ones we already provide (check Readme). Also, I can provide you with the checkpoint that does not use the box-stream and that checkpoint would work with your current script and input (+ removing --butd flag).

By number, I meant results on either of the datasets -- refcoco/flickr. But mostly the problem is in the script and not your environment.

— Reply to this email directly, view it on GitHubhttps://github.com/nickgkan/butd_detr/issues/27#issuecomment-1577159392, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3C3RANYGAZYWQPUFXQBIPLXJYGP3ANCNFSM6AAAAAAXTWDOYY. You are receiving this because you authored the thread.Message ID: @.***>

ayushjain1144 commented 1 year ago

Hi, just a clarification: do you need to test on images contained in Coco or visual genome or some outside ones as well?

The bottom up stream of butd-detr needs detected objects from an object detector. In our work, we use faster-rcnn model that was used in prior works, but that codebase is itself huge, so people just precompute detections using that code and load the npy files in the new code. So, if you want to use images from coco/visual genome, then it's easy but if not then it's not that straightforward. In that case, I will suggest to just use the top-down version of our model and I can push a fix for that -- there you would only need to supply an image+text.

Let me know your usecase and we can figure out the best possible solution

nsaadati commented 1 year ago

I think this also works, the only thing I want is just able to visualize and see how the model works.


From: Ayush Jain @.> Sent: Monday, June 5, 2023 1:58 PM To: nickgkan/butd_detr @.> Cc: Saadati, Nastaran [M E] @.>; Author @.> Subject: Re: [nickgkan/butd_detr] Inference (Issue #27)

Hi, just a clarification: do you need to test on images contained in Coco or visual genome or some outside ones as well?

The bottom up stream of butd-detr needs detected objects from an object detector. In our work, we use faster-rcnn model that was used in prior works, but that codebase is itself huge, so people just precompute detections using that code and load the npy files in the new code. So, if you want to use images from coco/visual genome, then it's easy but if not then it's not that straightforward. In that case, I will suggest to just use the top-down version of our model and I can push a fix for that -- there you would only need to supply an image+text.

Let me know your usecase and we can figure out the best possible solution

— Reply to this email directly, view it on GitHubhttps://github.com/nickgkan/butd_detr/issues/27#issuecomment-1577304654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3C3RAKPBQSZBMME2O3SKXLXJYT33ANCNFSM6AAAAAAXTWDOYY. You are receiving this because you authored the thread.Message ID: @.***>

ayushjain1144 commented 1 year ago

Hi,

I pushed some changes to make things easier, after pulling could you try running this script?

you can adjust here the path of the checkpoint (pretrain_2d is fine), and the image path and text utterance here.

Below is the output I get on your image and prompt, let me know if you cannot reproduce it.

Screenshot 2023-06-09 at 5 12 44 PM

Note: I didn't have to change the checkpoint and any other major code really (the checkpoint trained with boxes, works well without it too), so I am not sure why it didn't work for you. Maybe try running python test.py inside models/ops to make sure the deformable attention is installed correctly.

nsaadati commented 1 year ago

Thanks for your response I run it and I'm getting this error now, I did not download the extra data because it is a very huge file (146G) and I don't have enough space, Is there anyway that I can run it without this or can you please just share the part of this file that we need for running code? File "/home/exouser/newbt/butd_detr-bdetr2d/models/backbone.py", line 132, in init super().init(backbone, train_backbone, return_interm_layers, File "/home/exouser/newbt/butd_detr-bdetr2d/models/backbone.py", line 94, in init np.load(embeddings_path, allow_pickle=True)) File "/home/exouser/.conda/envs/bdetr2d/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '/data/beauty_detr/extra_data/class_embeddings.npy' Traceback (most recent call last): File "./tools/launch.py", line 192, in

ayushjain1144 commented 1 year ago

sure, here it is: https://drive.google.com/file/d/18Y9HIMB6u8NvVMIVs7g70r3Hw8J8bO7t/view?usp=sharing