Closed daegonYu closed 3 weeks ago
hello. The command to run in the Filtering Data For Contrastive Pretraining section of https://github.com/nomic-ai/contrastors/tree/main/scripts/text is
torchrun --nproc-per-node=<num_gpus> --dataset=< path_to_dataset_files_or_directory> --output_dir=<path_where_to_save_filtered_dataset> --query_key=<query_key_of_jsonl_file> --document_key=<document_of_key_jsonl_file>
Can I know which python file is being executed?
ah thanks for the headsup, it should be index_filtering.py. I've updated the README
index_filtering.py
hello. The command to run in the Filtering Data For Contrastive Pretraining section of https://github.com/nomic-ai/contrastors/tree/main/scripts/text is
Can I know which python file is being executed?