pixeli99 / W-CODA2024-Track2

This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving," held at ECCV 2024.
6 stars 0 forks source link

Instructions on how to generate the long videos #1

Open SecretMG opened 1 month ago

SecretMG commented 1 month ago

I would like to kindly ask if you could provide instructions on how to generate the long videos required for the competition (generate 3 any-length videos for each of the 3 scenes from the eval-long set (9 videos)) as a reference, so that the participants can confirm the correctness of their submissions.

flymin commented 1 month ago

If you check the "eval-long" file, there are only 3 sequences of annotations for you to generate. We require participants to generate each sequence with three different weather/time conditions (3*3=9 videos for submission). You can find the example of file names in the provided submission sample.

Note that, there is no "standard" operation to extend the 16-frame MagicDrive baseline for longer video generation. Participants can use any technique to extend the generation, as long as the results match the control signals (i.e., object bboxes, road map and camera poses).

SecretMG commented 1 month ago

Thank you for your assistance. However, I've attempted to simply modify the 'ann_file' arg to 'eval_long.pkl', but the results in the 'frames' folder appear identical to those obtained when using 'eval.pkl'. I followed the instructions provided here for the images generation process.

flymin commented 1 month ago

I understand you issue now.

It may need some modifications on the original code. I will update the video branch soon.

flymin commented 1 month ago

Hi, with the commit above, you can generate with eval_long through the following command:

python workshop/test_submit.py \
    resume_from_checkpoint=${pretrained_weights} \
    task_id=track2_long ++runner.validation_index=all \
    ++dataset.data.val.ann_file=data/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_track2_eval_long.pkl \
    show_box=false ++dataset.data.val.start_on_firstframe=true

Note that the last parameter is the key. Please do not leave it out.

This command should result in 3 16-frame generations from eval_long. You have to manually change the text prompts for other weather/time conditions and run three times for all the 9 videos.