Open zhentingqi opened 4 months ago
Hi! I am trying to download the crawl split 2023-50. I am running the command python -m cc_net --dump 2023-50, which raises the following error:
python -m cc_net --dump 2023-50
Will run cc_net.mine.main with the following config: Config(config_name='base', dump='2023-50', output_dir=PosixPath('data'), mined_dir='mined', execution='auto', num_shards=1600, min_shard=-1, num_segments_per_shard=-1, metadata=None, min_len=300, hash_in_mem=50, lang_whitelist=[], lang_blacklist=[], lang_threshold=0.5, keep_bucket=[], lm_dir=PosixPath('data/lm_sp'), cutoff=PosixPath('/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/data/cutoff.csv'), lm_languages=None, mine_num_processes=16, target_size='4G', cleanup_after_regroup=False, task_parallelism=-1, pipeline=['dedup', 'lid', 'keep_lang', 'sp', 'lm', 'pp_bucket', 'drop', 'split_by_lang'], experiments=[], cache_dir=None) Submitting _hashes_shard in a job array (1600 jobs) Traceback (most recent call last): File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 18, in <module> main() File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 14, in main func_argparse.parse_and_call(cc_net.mine.get_main_parser()) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/func_argparse/__init__.py", line 72, in parse_and_call return command(**parsed_args) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 638, in main all_files = mine(conf) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 340, in mine hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem)) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 265, in hashes ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs)) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/execution.py", line 106, in map_array_and_wait jobs = ex.map_array(function, *args) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 771, in map_array return self._internal_process_submissions(submissions) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions return self._executor._internal_process_submissions(delayed_submissions) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions array_ex.update_parameters(**self.parameters) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 810, in update_parameters self._internal_update_parameters(**kwargs) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 306, in _internal_update_parameters raise ValueError( ValueError: Unavailable parameter(s): ['slurm_time'] Valid parameters are: - account (default: None) - additional_parameters (default: None) - array_parallelism (default: 256) - comment (default: None) - constraint (default: None) - cpus_per_gpu (default: None) - cpus_per_task (default: None) - dependency (default: None) - exclude (default: None) - exclusive (default: None) - gpus_per_node (default: None) - gpus_per_task (default: None) - gres (default: None) - job_name (default: 'submitit') - mail_type (default: None) - mail_user (default: None) - mem (default: None) - mem_per_cpu (default: None) - mem_per_gpu (default: None) - nodelist (default: None) - nodes (default: 1) - ntasks_per_node (default: None) - num_gpus (default: None) - partition (default: None) - qos (default: None) - setup (default: None) - signal_delay_s (default: 90) - srun_args (default: None) - stderr_to_stdout (default: False) - time (default: 5) - use_srun (default: True) - wckey (default: 'submitit')
Can someone please help me solve the problem? Thanks!
Hi! I am trying to download the crawl split 2023-50. I am running the command
python -m cc_net --dump 2023-50
, which raises the following error:Can someone please help me solve the problem? Thanks!