Closed sc-56 closed 4 years ago
mapping_assignment_dataloader
: data used to find the mapping from output cluster to class.mapping_test_dataloader
: data that is actually evaulated to yield the test performance metric, using mapping found by mapping_assignment_dataloader
.For fully unsupervised clustering training and test sets are allowed to be the same so mapping_assignment_dataloader == mapping_test_dataloader
.
For semi-supervised overclustering which makes material use of labels to find the cluster-to-class mapping, test set has to be unseen (as in supervised evaluations), so mapping_assignment_dataloader != mapping_test_dataloader
(mapping_assignment_dataloader
counts as training data but mapping_test_dataloader
does not).
See also this thread on data loading, I added a simpler function.
Thx for the support, and sorry for furthuer asking.
In the situation of 'unsupervised learning', what is the difference of content between the [training dataset] and [mapping_assignment_dataset], is the only way of transformation (tf2 vs. tf3) different?
tf2 and tf3 are different, and sometimes the underlying data is different.
All the data partitions for unsupervised learning are listed in this function. In most cases [training dataset] and [mapping_assignment_dataset] have the same underlying data. The exception is STL10, where mapping_assignment_dataset excludes the unlabelled portion of the training data (because mapping_assignment_dataset is used to find the cluster-to-class mapping).
Thank you so much for your comprehensive explanation.
really appreciated, and thx a lot.
Sorry for asking,
I'm really wondering what is in the "mapping/assigment dataloader" in the evaluation phase, and is the same content in the "mapping dataloader" and "assignment dataloader"?
Thanks for your support.