This repository showcases how to train a YOLOv8 deep learning model on the Pyronear dataset. Key features include the use of DVC for data versioning and MLflow for model versioning and performance tracking, with cloud storage for data.
To install necessary libraries, run:
pip install -r requirements.txt
Download the dataset using the following command:
gdown --fuzzy https://drive.google.com/file/d/12gGuFd3aQmtPXP-cbBRjsciWLtpFNBB-/view?usp=sharing
Unzip and organize the dataset:
mkdir datasets
unzip DS-18d12de1.zip -d datasets/
Update the dataset path in data_configuration.yaml
.
The dataset comprises 596 training images and 148 validation images featuring forest landscapes with smoke. Each image (640x480 pixels) is annotated with a bounding box in a corresponding txt file, marking the smoke areas.
Use the same requirements file to install DVC.
Initialize DVC in your workspace:
dvc init
Set up remote storage (e.g., AWS S3, Google Cloud Storage):
dvc remote add -d remote_storage path/to/your/dvc_remote
Track data and configuration files using DVC:
dvc add <file_or_directory>
git add .dvc/<file_or_directory>.dvc .gitignore
⚠️ Requires GPU.
MLflow is used for experiment tracking and model management. Key tracked metrics include epochs, accuracy, and loss.
Start the MLflow UI:
mlflow ui
(Optional) Specify a custom port:
mlflow ui --port <port_number>
Execute the training script with specified data and model configurations:
python3 train_yolo.py --data_config data_configuration.yaml --model_config model_configuration.yaml
Add AWS credentials to the training script:
"s3", aws_access_key_id="your_access_key_id", aws_secret_access_key="your_secret_access_key"
You've successfully set up and run the Pyronear machine learning pipeline for wildfire detection.