Open hlky opened 3 months ago
Having looked into this further it seems the core of the issue is with two different formats in the same repo.
When the parquet
config is first, the WebDataset
s are loaded as parquet
, if the WebDataset
configs are first, the parquet
is loaded as WebDataset
.
A workaround in my case would be to just turn the parquet
into a WebDataset
, although I'd still need the Dataset Viewer config limit increasing. In other cases using the same format may not be possible.
Relevant code:
Following documentation I had defined different configs for
Dataception
, a dataset of datasets:The intent was for metadata to be browsable via Dataset Viewer, in addition to each individual dataset, and to allow datasets to be loaded by specifying the config/name to
load_dataset
.While testing
load_dataset
I encountered the following error:The correct file is downloaded, however the incorrect builder type is detected;
parquet
due to other content of the repository. It would appear that the config needs to be taken into account.Note that I have removed the additional configs from the repository because of this issue and there is a limit of 3000 configs anyway so the Dataset Viewer doesn't work as I intended. I'll add them back in if it assists with testing.