pyronear / pyro-vision

Computer vision library for wildfire detection 🌲 Deep learning models in PyTorch & ONNX for inference on edge devices (e.g. Raspberry Pi)
https://pyronear.org/pyro-vision/
Apache License 2.0
51 stars 24 forks source link

[datasets] Module design suggestion #94

Closed frgfm closed 2 years ago

frgfm commented 3 years ago

Here is a suggestion for the pyrovision.datasets module's organization:

This suggestion is based on the organization of https://github.com/pytorch/vision/tree/master/torchvision/datasets which is quite effective. The documentation should also include detailed explanation on how to use it.

If one dataset was made available by Pyronear, it should be accessible via a link with a hash for integrity check.

TekayaNidham commented 3 years ago

Good idea, that would work

frgfm commented 3 years ago

Hey @MateoLostanlen @blenzi @x0s @Akilditu ,

While refactoring the datasets module, I ended up wondering whether some things should be removed because of their misalignment with the repo's purpose. Generally speaking, the purpose of the datasets module in this repo is to make for each dataset its source accessible (using processed annotations if need be) and offering a torchvision dataset. I believe there are some features linked to wildfire that are not aligned with this (considering that for now, the source of wildfire is not even accessible publicly). Some details below:

openfire

video_utils

utils

wildfire In this module, I believe we need to make some decisions that users won't have to do later including: decide on some criteria using metadata to discard invalid samples (frames), all the remaining frames make the dataset (imbalanced perhaps but still). Next we need to select sampling (none, origin_proportion, positive_ratio), and finally train/val/test split (this last part can be done in the same fashion as with sklearn). In the short term:

I do believe it would bring much more value to upload a clean subset of the dataset similar to the one of @MateoLostanlen to have a stable user-friendly and shareable dataset. What do you think?

frgfm commented 3 years ago

Any feedback @MateoLostanlen @blenzi @fe51 @Akilditu ? :)

MateoLostanlen commented 3 years ago

@Akilditu has sarted a Dataset Repo, I guess of these stuff should move there

frgfm commented 2 years ago

Closed by #136 & #138