microsoft / CameraTraps

PyTorch Wildlife: a Collaborative Deep Learning Framework for Conservation.
https://cameratraps.readthedocs.io/en/latest/
MIT License
782 stars 246 forks source link

make_classification_dataset with custom train/test split #153

Closed davidwhealey closed 4 years ago

davidwhealey commented 4 years ago

Thanks for making this great repo!

I love that make_classification_dataset.py allows you to crop images using the megadetector and create the tfrecords at the same time. One thing that has not been clear to me is how to specify a specific train/test split rather than having the script automatically split across locations.

Is there a way to do this?

yangsiyu007 commented 4 years ago

Hi there! Unfortunately that's not possible right now (we're looking to change this in the summer), but as a workaround, you can run the script on your training set images and set the split ratio to have zero test images (to create the training tfrecords), and then again on the test set, specifying for there to be zero training images (to create the test tfrecords).

davidwhealey commented 4 years ago

I was getting a nonempty test.json even with a zero training image fraction, which I did not investigate further. But what I ended up doing was to create custom train.json and test.json files in the cropped dataset, then ran data_management/tfrecords/create_tfrecords_from_coco.py That appears to have worked.