tusharsangam / TransVisDrone

MIT License
36 stars 4 forks source link

TransVisDrone

Arxiv Preprint: https://arxiv.org/abs/2210.08423

Major Update: Paper Accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA) πŸŽ‰

Project Page link

Pretrained models available at Drive

Codes and Visualizations coming soon! Cleaning up training & validation runs, will upload pretrained models soon

Processing NPS daataset

Download annotations from Dogfight Github understand the annotation format. Download the videos from Original NPS site Extract all the frames in folder called AllFrames starting from 0th index (without skip), plot annotations over videos to verify annotations are mapping to videos.
This applies to NPS only, My original frame extraction & annotations of NPS starts from index 0 but to follow tph-yolov5's visdrone style I converted 0 index to 1 index in Step 1. However this is not reuired for FL-drones & AOT dataset processing.

Step1 :

To account for that I created conversion script nps_to_visdrone.py to symlink data with index offseting.

Step2 :

Convert visdrone to yolov5 style Please change the root paths accordingly. Train, Val, Test split followed as in dogfight paper. However contrary to dogfight we don't only use every 4th frame but all the frames in training & testing.

Step3:

After following Step 1 & Step2 data folder should look like /home/tu666280/NPS-Data-Uncompressed/AllFrames/{train, test, val} for annotations /home/tu666280/NPSvisdroneStyle/{train/labels, val/labels, test/labels} you can keep the root folder same as long as frame & annotation indices are matching. Create a new folder like this /home/tu666280/NPS/Videos/{train, val, test} & symlink your original videos or create video_length_dict.pkl with python dictionary format as {"int(video_id)" such as 1,2 : int(num_frames)}

Finally make into the format of NPS.yaml where there are train & val folders containing frames in Clip{id}{frameid.zfill(5)}.png & annotations in the format of Clip{id}_{frame_id.zfill(5)}.txt in yolov5 format. Videos folder with either videos symlinks in it or video_length_dict.pkl where its a python dictionary stored in {"int(video_id)": "int(num_frames)"}.

Processing FL-drones dataset

Download fl-drones dataset annotations as described above. Fl-drones dataset is not publicly available & needs to be obtained from permission with authors. I obtained it as our research collaborater had obtained prior permission. For each video half the frames are in training & rest are in validation or testing. There's no test section of this dataset. Validation is testing & we don't do cross-validation style training.

Step1 :

Convert fl-drones annotations to Visdrone style.

Step2 :

If annotations are not in yolov5 format then run convert to yolov5 script

Step 3:

Videos folder will be generated by Step-1.

Finally make into the format of FLDrone.yaml where there are train & val folders containing frames in Clip{id}{frameid.zfill(5)}.png & annotations in the format of Clip{id}_{frame_id.zfill(5)}.txt in yolov5 format. Videos folder with either videos symlinks in it or video_length_dict.pkl where its a python dictionary stored in {"int(video_id)": "int(num_frames)"}.

Processing AOT

Coming soon.

Training NPS, FL-drones & AOT

Please follow whatever parameters are set in submit-train.slurm & submit-test.slurm. In training ampere in SBATCH refers to 42 gb NVIDIA ampere gpu. AOT training requires 2 of those. All the dataset trainings are commented in submit-train.slurm file. Uncomment as per your need, only modify --data flag with updated {NPS, FLDrone, AOT}.yaml with your respective data paths.

Running pre-trained checkpoints

Please follow runs/train/NPS//weights, runs/FL//weights, runs/AOT//weights todownload the weights files. While best runs are saved in runs/val/

Evaluate AOT results

AOT results can't only be evaluated using COCO criteria, it has its own grading criteria. To run evaluation, save all the predictions & run evaluate_aot.py with given arguments. To fasten up the evaluation, I split up AOT test data in chunks, get predictions parallely using SBATCH Array & then fuse all the predictions in one.

Citation

If you find our work useful in your research, please consider citing:

@INPROCEEDINGS{10161433,
  author={Sangam, Tushar and Dave, Ishan Rajendrakumar and Sultani, Waqas and Shah, Mubarak},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)}, 
  title={TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos}, 
  year={2023},
  volume={},
  number={},
  pages={6006-6013},
  keywords={Performance evaluation;Visualization;Image edge detection;Robot vision systems;Transformers;Throughput;Real-time systems},
  doi={10.1109/ICRA48891.2023.10161433}}

Contact

If you have any questions, please feel free to contact us:

Tushar Sangam: tusharsangam5@gmail.com

Ishan Dave: ishandave@knights.ucf.edu

References