Open zhentingqi opened 9 months ago
Hi! Can anyone please tell me how to run the full mining pipeline using cc_net on just a very small portion of CC? E.g., I just want to around 100M cleaned data of the newest crawl 2023-50. Thanks!
Hi! Can anyone please tell me how to run the full mining pipeline using cc_net on just a very small portion of CC? E.g., I just want to around 100M cleaned data of the newest crawl 2023-50. Thanks!