Closed RasmusOrsoe closed 2 months ago
@Aske-Rosted thanks for taking a look. Looks like I by mistake managed to merge another branch into this one, causing the checks to fail. I think your comments on the toggles between "test", "train" and "no-noise" is fair - and is granted quite specific to what I intend to use it for. I'll close this PR and make a new one in the future.
This PR adds extensions of
ERDAHostedDataset
that allows us to build and share public benchmarking datasets, and secret ones! It also introduces functionality toParquetDataset
that removes chunk ids fromselection
that doesn't exist.Below is an example of the syntax of
SecretDataset
- a way for us to share datasets using ERDA sharelinks:The idea here is that we can distribute datasets "secretly" to colleagues, and once the data is ready to be made public, the data can be made available through the
PublicBenchmarkDataset
by subclassing, providing a similar syntax: