I have futher integrated #178 into the rest of the pipeline.
The validation loss is now calculated per bin. We just randomly select one spectrum per inchikey and do an all vs all comparison between all these spectra. The losses are calculated by calculating the loss for each inchikey bin and taking the mean.
This means we do not use the multiple available validation spectra for each inchikey, but this is intentional. This speeds up the running, without a lot of loss of generatlization. But it would be easy to implement sampling multiple spectra per inchikey.
Still to do:
Remove old DataGenerator functionality for val_data_generator (fixed set etc)
Remove other old functions that are not needed anymore (if any)
I have futher integrated #178 into the rest of the pipeline.
The validation loss is now calculated per bin. We just randomly select one spectrum per inchikey and do an all vs all comparison between all these spectra. The losses are calculated by calculating the loss for each inchikey bin and taking the mean.
This means we do not use the multiple available validation spectra for each inchikey, but this is intentional. This speeds up the running, without a lot of loss of generatlization. But it would be easy to implement sampling multiple spectra per inchikey.
Still to do: