rom1504 / cc2dataset

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
MIT License
307 stars 23 forks source link

fix test #32

Closed rom1504 closed 1 year ago

rom1504 commented 1 year ago

https://github.com/rom1504/cc2dataset/actions/runs/3625943650/jobs/6114490241

seems like calling CC from test doesn't work too well just put a shard somewhere else probably

rom1504 commented 1 year ago

seems to work actually

rom1504 commented 1 year ago

done