Closed hicotton02 closed 7 months ago
@hicotton02 you should be able to run (in the local-data
branch) artifacts prep from local data now using :
python3 app/src/prep_artifacts.py \
--artifacts-dir /path/to/artifacts \
--cc_input /path/to/cc/listings.txt
--cc_input_base_uri file:///path/to/cc/data/root \
--lang LANG \
--dsir_num_samples DSIR_SAMPLES \
--classifiers_num_samples CLASSIFIERS_SAMPLES \
--max_samples_per_book 1000 \
--max_paragraphs_per_book_sample 250
@mauriceweber Opening an issue as requested to enable artifiact prep on local ccnet data instead of S3 bucket.