Parallelize `predict` subcommand

modelfoxdotdev / modelfox

ModelFox makes it easy to train, deploy, and monitor machine learning models.

Other

1.46k stars 63 forks source link

Parallelize `predict` subcommand #117

Open koute opened 2 years ago

koute commented 2 years ago

Predicting from the CLI is painfully slow since it runs only on a single thread, so here's a quick fix.

Caveat: this essentially preloads the whole CSV into memory. It is still a net improvement though since prediction on a single core becomes a problem far earlier (for a CSV which is less than 100MB it's already unusably slow) than consuming too much memory does.

nitsky commented 2 years ago

@koute thanks for the PR! We decided to make prediction single threaded for exactly the reason you identified: to avoid buffering the whole CSV. I agree that it is a good idea to provide the option to parallelize predictions, but I think we should put it behind a command line argument, --parallel, that explains the tradeoff. Would you be willing to update this PR with that change?

koute commented 2 years ago

Okay, now it doesn't read the whole CSV but will still run in parallel. (:

Speed-wise it's the same; the old version with preloading the CSV:

real    0m11.739s
user    10m29.132s
sys 0m14.350s

the new version without preloading the CSV:

real    0m11.425s
user    9m53.702s
sys 0m51.725s

Is this acceptable or do you still want me to add an argument to disable parallelization?