Closed shahrukhx01 closed 10 months ago
FYI: The S3 bucket is not publicly accessible unfortunately.
Hi, Unfortunately, we can't share the processed data used in training CLARA, but you can use the Laion audio dataset scripts to download and process the data as needed. Most of the datasets are publicly available.
The CLARA codebase uses TorchData/webdataset format. In the future, I'd like to release a pipeline to perform augmentation and data processing.
@knoriy First of all thanks for the great contribution. I am trying to reproduce the results using the medium model checkpoint on the underlying datasets from the paper. However, I am unsure how the datasets can accessed. Could you please point me to how I can prepare/download the datasets? Thank you!