Open notiho opened 1 year ago
I wonder if it makes sense to log the number of the sample in the train_set, i.e. the result of np.random.randint instead of or next to the i, so that it can be looked up more easily? If the train_set is shuffled or if the dataset is an Arrow file, then this number would not map to an image that you can look up easily.
Because the samples might be substituted on the fly if loading them fails for any reason the indices might not actually correspond to the order of the dataset. And as a note you can unpack a binary dataset with contrib/extract_lines.py
now but if you're defining fixed splits in the file the indices in the extracted file names won't match the ones used during training even in the absence of errors.
I now also added logging for validation images and their prediction. How do you feel about adding another parameter to control the image logging?
Just an additional comment: this would be a wonderful thing to also log the prediction with it :) But either, thank you for this PR !