Closed AndresFRJ98 closed 4 years ago
That amount of memory consumption is expected. You can reduce it by processing fewer tasks at a time (n_jobs
in Parallel()
). There's one "task" for each example in the dataset, so depending on which split you're processing, it could be ~90k or ~8k.
Hello,
I am trying to get the intial repo to work by following the steps provided. In the preprocessing step, upon running:
python main.py --mode prepro --data_file hotpot_train_v1.1.json --para_limit 2250 --data_split train
The pre-processing begins. However, the memory consumption on my machine becomes enormous (using up to 10gb), so I decided to terminate it. How many 'tasks' (as it displays while running) does the preprocessing have to go through?Is this amount of memory consumption normal or is something wrong with my setup/enviornment?
Thanks