snumprlab / cl-alfred

Official Implementation of CL-ALFRED (ICLR'24)
https://bhkim94.github.io/projects/CL-ALFRED/
GNU General Public License v3.0
12 stars 2 forks source link

Hello. Could you provide the generated files of the proposed 5 views (the image features and videos are not required)? #4

Closed wqshmzh closed 4 days ago

wqshmzh commented 1 month ago

I am happy to find that your work proves that continual learning can be applied in the embodied environment. I am very interested in digging this field much deeper. However, when I generate the 5 views using the provided script named "augment_trajectories.py" in a multi-threading way, some threads are always stuck in a sudden and cannot continue. Could you please upload the generated images, including the RGB images, instance masks, and depth images of the 5 views and the "augmented_traj_data.py" ? I find that https://huggingface.co/datasets/byeonghwikim/abp_dataset only contains the annotations. Thank you very much :-)

bhkim94 commented 1 month ago

Hi @wqshmzh,

Thank you for having an interest in our work!

I find that https://huggingface.co/datasets/byeonghwikim/abp_dataset only contains the annotations.

Yes, the current dataset version contains only annotations with 5-view image "features."

Could you please upload the generated images, including the RGB images, instance masks, and depth images of the 5 views and the "augmented_traj_data.py" ?

Sure! We'll upload the images, masks, and depths as well.

wqshmzh commented 1 month ago

It would be nicer if you could upload the generated images, masks, and depths in a zipped file, just like what ALFRED benchmark does. Otherwise the download speed would be limited. The saved feature files are not specifically requested since we want to save some disk space. Or you can zip another file that only contains the image features for those who want the image features also. Thank you again !

bhkim94 commented 1 month ago

Thank you for the suggestion. We'll modularize the current dataset to avoid downloading unnecessary files and include the 5-view images, depths, and masks.

wqshmzh commented 1 month ago

Hello. Could you please provide the exact parameter settings for running ./gen/scripts/augment_trajectories.py to locally generate the RGB, depth, and mask images of the 5 views. The "README.md" doesn't mention any info about how to conduct the generation process. I have tried generating the images for nearly half of tasks, but I find that the number of generated images is different from the number of images defined in the "ann_X.json" file inside each task folder.

bhkim94 commented 1 month ago

Hi @wqshmzh,

Could you please provide the exact parameter settings for running ./gen/scripts/augment_trajectories.py to locally generate the RGB, depth, and mask images of the 5 views. The "README.md" doesn't mention any info about how to conduct the generation process.

The 5-view images were extracted a few years ago, so I'm not sure of the exact parameters. I faintly remember that I set all the hyperparameters as default.

I have tried generating the images for nearly half of tasks, but I find that the number of generated images is different from the number of images defined in the "ann_X.json" file inside each task folder.

Yes, that's expected because as far as I know, the number of images defined in ann_X.json contains frames observed during rotation actions, which we do not use for training. Putting smooth_nav will generate the rotation frames and the number of the images including the rotation images will match the number of images defined in ann_X.json.

Hope this helps!

wqshmzh commented 1 month ago

Great. My locally generated images are generated with "args.smooth_nav" to False, which matches to your setting. Now I can continue my generantion. :-) I also find the number of loaded image features matches to the length of low_level_action of each task.

bhkim94 commented 1 month ago

Great to hear that! Good luck with your project!

JACK-Chen-2019 commented 2 weeks ago

Hello, after generating images using the above method, I noticed that the number of generated images is 10 more than the number of actions in the JSON file. However, this matches the number of images in the dataset you provided. I used these images to generate feat_conv_panoramic.pt through the local resnet18 model and encountered something quite puzzling: I compared the features I generated with the features you provided one by one using cosine similarity. Out of 7080 examples, only 700 were similar, and the rest had a cosine similarity greater than 0.1. Could there be some special processing method here that I'm missing? I got very poor training results using these generated features. Additionally, could you please explain how 'feat_conv_colorSwap{}_panoramic.pt' and 'feat_conv_onlyAutoAug{}_panoramic.pt' are generated? What is the difference between using them and not using them?

bhkim94 commented 1 week ago

Hi @JACK-Chen-2019,

after generating images using the above method, I noticed that the number of generated images is 10 more than the number of actions in the JSON file. However, this matches the number of images in the dataset you provided.

This is expected as generating 10 more frames is from the image generation code of the original ALFRED repo.

I used these images to generate feat_conv_panoramic.pt through the local resnet18 model and encountered something quite puzzling: I compared the features I generated with the features you provided one by one using cosine similarity. Out of 7080 examples, only 700 were similar, and the rest had a cosine similarity greater than 0.1. Could there be some special processing method here that I'm missing? I got very poor training results using these generated features.

This is weird. Which code did you use to get the features? Can you try this generation code (https://github.com/snumprlab/abp/blob/master/models/utils/extract_resnet.py)? We recently released the code of ABP, which is the model used in our CL-ALFRED, and this should give you reasonable features.

Additionally, could you please explain how 'feat_conv_colorSwap{}_panoramic.pt' and 'feat_conv_onlyAutoAug{}_panoramic.pt' are generated?

For generation, you can check the ABP link above.

What is the difference between using them and not using them?

These augmentation techniques quite improve the model performance, so if you don't use them, you may observe noticeable performance drops from all models trained with the techniques. But our benchmark is designed to evaluate continual learning methods rather than embodied agents themselves, so using our benchmark without augmentation should not be problematic in theory.

Hope this helps!

JACK-Chen-2019 commented 1 week ago

Thanks for your reply! I solved the problem by using the extract_resnet.py.

bhkim94 commented 4 days ago

Hi @wqshmzh,

I am happy to find that your work proves that continual learning can be applied in the embodied environment. I am very interested in digging this field much deeper. However, when I generate the 5 views using the provided script named "augment_trajectories.py" in a multi-threading way, some threads are always stuck in a sudden and cannot continue. Could you please upload the generated images, including the RGB images, instance masks, and depth images of the 5 views and the "augmented_traj_data.py" ? I find that https://huggingface.co/datasets/byeonghwikim/abp_dataset only contains the annotations. Thank you very much :-)

We uploaded raw RGB images (including corresponding depths and object masks) here.

bhkim94 commented 4 days ago

Closing this issue due to inactivity.