Open koute opened 2 years ago
@koute thanks for the PR! We decided to make prediction single threaded for exactly the reason you identified: to avoid buffering the whole CSV. I agree that it is a good idea to provide the option to parallelize predictions, but I think we should put it behind a command line argument, --parallel
, that explains the tradeoff. Would you be willing to update this PR with that change?
Okay, now it doesn't read the whole CSV but will still run in parallel. (:
Speed-wise it's the same; the old version with preloading the CSV:
real 0m11.739s
user 10m29.132s
sys 0m14.350s
the new version without preloading the CSV:
real 0m11.425s
user 9m53.702s
sys 0m51.725s
Is this acceptable or do you still want me to add an argument to disable parallelization?
Predicting from the CLI is painfully slow since it runs only on a single thread, so here's a quick fix.
Caveat: this essentially preloads the whole CSV into memory. It is still a net improvement though since prediction on a single core becomes a problem far earlier (for a CSV which is less than 100MB it's already unusably slow) than consuming too much memory does.