mlbio-epfl / turtle

[ICML 2024] Let Go of Your Labels with Unsupervised Transfer
https://brbiclab.epfl.ch/projects/turtle/
45 stars 5 forks source link

Training on data without labels #1

Closed M-Fannilla closed 5 months ago

M-Fannilla commented 5 months ago

Is it possible to train turtle model on dataset without labels at all?

agadetsky commented 5 months ago

@M-Fannilla thank you for your question.

TURTLE is fully unsupervised method, thus it does not require any labels for training. For evaluation purposes run_turtle.py also loads ground truth labels to compute clustering accuracy every N iterations. In case you want to use the current codebase to run TURTLE on your own dataset where ground truth labels are unknown at all, you can either comment the corresponding lines for loading and evaluation in run_turtle.py or just provide some dummy labels and ignore the logging of clustering accuracy.

Best, Artyom

M-Fannilla commented 5 months ago

That's what I thought, big thanks for clarification @agadetsky. I have a large dataset of unlabelled images of 'explicit nature'. Goal is to train multilabel classifier to identify explicit labels. It's really refreshing to see such approach as `turtle' to tackle unsupervised learning.

agadetsky commented 5 months ago

@M-Fannilla We are happy to hear you find TURTLE great fit for your real world large-scale dataset! I am closing the issue as resolved.

Let us know if you need any further help or clarifications.

Best, Artyom