hab-spc / hab-ml

Harmful algae bloom CNN detection model, developed in PyTorch

7 stars 1 forks source link

Train 20 Class HAB-Model #39

Closed ktl014 closed 5 years ago

ktl014 commented 5 years ago

Story:

As a data annotator/biologist/Product Owner, I want to annotate predicted data from a model to lighten the workload of having to annotate the entire dataset. Therefore, this will be about first training the model.

Definition of Done:

[x] Model evaluation report (should include training/val set size and image distribution, loss curve of the model, confusion matrix, a final accuracy, a benchmark table comparison between all the different parameters used)

Story closure acceptance will be determined by @ktl014

ktl014 commented 5 years ago

Evaluation Report

Stats summary txt should contain the training/val set size and performance metrics.

└── HAB20_Model
    ├── loss.png
    ├── confusion.png
    └── stats_summary.txt

Table example

Model	batch_size	max_seq_length	learning_rate	Epochs	training_time	Accuracy (test)	F1 (test)
char+BiLSTM+CRF	6	-	1E-03	15	57 min	97.52%	77.80%
BERT-Large	6	128	2E-05	4	40 min	98.31%	81.23%
BERT-Large	6	128	5E-05	4	39 min	98.34%	80.97%
BERT-Large	2	256	2E-05	4	108 min	98.20%	80.33%
BERT-Large	3	128	2E-05	4	58 min	98.30%	81.91%
BERT-Large	3	256	2E-05	4	82 min	98.27%	81.58%
BERT-Large	6	128	2E-05	3	55 mins	98.26%	81.52%