tfaehse / DashcamCleaner

Censor identifiable information in videos, in particular dashcam recordings in Germany.
GNU Affero General Public License v3.0
130 stars 27 forks source link

generate_training_data.py broken dependency requirements #81

Closed Rom3dius closed 9 months ago

Rom3dius commented 11 months ago

Hi,

generate_training_data.py requires anonymizer, which in turn requires tensorflow-gpu and a myriad of other packages, as well as a lower python version (3.6) to be installed. This is in conflict with DashcamCleaner, making it no longer possible to run generate_training_data.py

Is this moving towards a more manual approach to creating training data? Or is the script still in use, but setting up the environment for it isn't documented?

tfaehse commented 9 months ago

You're 100% correct with your suspicions. Effectively, I generated training data using that script by installing Anonymizer's dependencies and whatever else I needed, but in the time since I mostly added some manually labelled data.

By now, I would just use a foundation model (EVA02 is fantastic if your desired classes match their pretrained models, GroundingDINO for example accepts text prompts). There are even (very nice!) projects like autodistill that automate this process - you provide unlabelled data, define your classes (and mappings to prompts), what model to use and the rest happens automatically.