jialuli-luka / PanoGen

Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation
71 stars 4 forks source link

Pretrain arguments missing #6

Closed honghd16 closed 11 months ago

honghd16 commented 11 months ago

Hi Jialu,

Thanks for your great and inspiring work! I was trying to run the pre-train code but found the "aug_prob" and "ft_type" is not defined and provided in the VLN-DUET/pretrain-src. I also tried to find it in the paper but failed. Could I know how to set these two arguments to run the pertaining with both the original env and the generated env?

Cheers

honghd16 commented 11 months ago

And the "multi" in the map_nav_src is also not defined. https://github.com/jialuli-luka/PanoGen/blob/461c2fbb48b0bba8ad8f75cb6847645d79b9022e/VLN-DUET/map_nav_src/utils/data.py#L38

jialuli-luka commented 11 months ago

Hi, Sorry for the confusion. The default value in the code pretrain_src/data/dataset.py is used for aug_prob(0.3) and ft_type("ori"). Multi is not used during fine-tuning, and can be safely deleted.

honghd16 commented 11 months ago

Thank you so much for the quick reply! Another issue is about the provided data. I am focusing on the VLN pretraining and finetuning, so I just skip the step 1-4 and directly used the data provided in the dropbox. However, several questions comes:

  1. I didn't find the R2R_train_inpaint_data_enc.json. I know it is the correspondiong instructions of the generated environments. So do I need to run the mPLUG/scripts/vln_inference.sh to get this file?
  2. the img_ft_file for the original env is the vit-16.hdf5 while the EnvEdit provides with the tsv file. Should I change this to the tsv file or need to use CLIP to extract the features myself? https://github.com/jialuli-luka/PanoGen/blob/461c2fbb48b0bba8ad8f75cb6847645d79b9022e/VLN-DUET/pretrain_src/config/r2r_pretrain.json#L38C20-L38C20
  3. The provided environments include two zip files, views_img_sd.zip which are 36 novel images generated by stable diffusion with captions, and the views_img_sd_inpaint.zip which is generated by outpainting. I found the 12.jpg is missing in the views_img_sd_inpaint. Is that means these outpainting images are all generated based on the 12.jpg in the views_img_sd?
  4. As self.multi is not needed, should I set it to false or true in this code:https://github.com/jialuli-luka/PanoGen/blob/461c2fbb48b0bba8ad8f75cb6847645d79b9022e/VLN-DUET/map_nav_src/utils/data.py#L47 Sorry for so many questions.
jialuli-luka commented 11 months ago

Hi,

  1. I just uploaded the enc file here: https://www.dropbox.com/scl/fo/8x09y38hv7m0pvj0xi4h8/h?rlkey=ka9assm9qznfkz19ar640zdfe&dl=0.
  2. You could use vit-16.hdf5 file (should be the same as the .tsv feature). I just uploaded the .hdf5 feature in the dropbox: https://www.dropbox.com/scl/fo/8x09y38hv7m0pvj0xi4h8/h?rlkey=ka9assm9qznfkz19ar640zdfe&dl=0.
  3. Yes they are all generated based on the 12.jpg in the views_img_sd.
  4. You should set it as false.
honghd16 commented 11 months ago

Many thanks! That will help me a lot!

charchit7 commented 9 months ago

@honghd16 could you please tell the folder structure where you placed all the downloads from the dropbox. That would really help.

Thanks :)

honghd16 commented 9 months ago

@honghd16 could you please tell the folder structure where you placed all the downloads from the dropbox. That would really help.

Thanks :)

I think there are two parts. One for generating panoramas including Steps 1-4, and the other for DUET training in Step 5. I just skip the first 4 steps to directly use the generated results provided by the author. As for the navigation training, I just put them following the DUET structure, you can refer to the DUET repo: https://github.com/cshizhe/VLN-DUET. Some adjustments are may needed which you can refer to the path claimed in the code and the running errors, haha.

charchit7 commented 9 months ago

Thanks @honghd16 :) I am looking to generate 360 coherent pan-optic images using the prompts. Could you please tell how can I do it? which inference scripts to run and all?

honghd16 commented 9 months ago

Thanks @honghd16 :) I am looking to generate 360 coherent pan-optic images using the prompts. Could you please tell how can I do it? which inference scripts to run and all?

I didn't generate them myself. But I think the README file has illustrated this process well. Maybe you can ask the author for more details.

charchit7 commented 9 months ago

Thanks @honghd16 :) I am looking to generate 360 coherent pan-optic images using the prompts. Could you please tell how can I do it? which inference scripts to run and all?

I didn't generate them myself. But I think the README file has illustrated this process well. Maybe you can ask the author for more details.

Thanks!