Issue with Macro Average F1 Score in modAL for Multi-Class Classification

Hello, I am currently using the modAL library to develop active learning strategies. My project involves multi-class classification implemented with PyTorch, utilizing DinoV2 as the backbone model. I have encountered an issue with the macro average F1 score in my active learning loop.

When I used the macro average F1 score, the results were unsatisfactory. The active learning loop remained stuck at one F1 score value during each iteration. However, when I switched to the micro average F1 score, the results improved significantly.

I am confident in the quality of my dataset and the factors influencing it, as it performs well with other strategies using multi-class classification and the macro F1 score. Therefore, I suspect there might be an issue related to modAL's handling of the macro average F1 score.

I would like to know if there are any known inconsistencies or issues with using macro averaging in modAL when working on multi-class classification.

Thank you for your guidance in advance.

modAL-python / modAL

Issue with Macro Average F1 Score in modAL for Multi-Class Classification #188