rosinusserrano / pml_vqvae

Repository for the course "Project Machine Learning" during WiSe 24/25 at TU Berlin consisting of a replication of the paper "Neural Discrete Representation Learning" (van den Oord et al., 2018).
0 stars 0 forks source link

Datasets 2.0 #4

Closed timonpalm closed 2 weeks ago

timonpalm commented 3 weeks ago

I refactored the dataset classes. The manually parsing of the images folders was meticulous and there were no information about the classes and their labels. Thus, I fall back on the pytorch class, which can read the meta.bin. Otherwise, I would have just copied their implementation for that.

Further, I implemented a function to display some basic stats of the datasets (i.e distribution per class etc.). It is also now possible to create a subset of each dataset with n_samples per class. So the subsets will also always be uniformly distributed among the classes.