Closed supritashankar closed 5 years ago
Any ideas @kimiyoung?
One possibility is that you're running out of memory at a certain step. Could you try reducing the number of parallel workers (in the Parallel
call) and/or monitoring your RAM usage when this happens?
Hi Peng,
Thank you for your reply!
You are right! When I check the top
I see a sharp fall in memory available.
top - 11:40:25 up 2 days, 18:35, 4 users, load average: 7.09, 7.56, 7.13
Tasks: 32 total, 3 running, 29 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.6 us, 2.6 sy, 0.0 ni, 89.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 33554432 total, 40 free, 33520996 used, 33396 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 40 avail Mem
I tried running it with parallelism 2
outputs = Parallel(n_jobs=2, verbose=10)(delayed(_process_article)(article, config) for article in data)
- but it still fails!
Surprisingly, when I run it mac - it is much slower but it does not get killed and gets through the whole preprocessing. My Mac only has 8 cores. Whereas the gpu machine looks like this
CPU(s): 80
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
In that case I would probably try something larger than 2 but smaller than the original number of jobs you tried originally (that failed) -- that should probably give a good balance between speed and memory usage!
Closing for now since we have identified the issue, feel free to reopen/open a new issue if something else in preprocessing fails for you!
When I run the preprocessing step - the job always fail at this step (after processing 64,000 questions)
Has anybody else faced this issue?