The RedPajama-Data repository contains code for preparing large datasets for training large language models.
4.43k
stars
335
forks
source link
Impossible unpack tail data... took time to download, but impossible to unpack dataset without quality signals with broken link. #94
Closed
RuslanKovalyov closed 5 months ago
An error occurred while processing the 'common_crawl' configuration: Couldn't find file at https://data.together.xyz/redpajama-data-v2/v1.0.0/quality_signals/2023-14/0000/en_tail.signals.json.gz
open link with browser: Error 404 This object could not be viewed