Open OneRaynyDay opened 2 years ago
Would this negatively impact performance?
This would. It would mean that we must put our threadpool behind a mutex/rwlock and lock it in every access. Given that we do parallelism on so many levels this would really hurt performance and is not something that is really feasible in a way we'd like to see.
Problem Description
Each query has its own performance characteristics and it's hard to prescribe a single threadpool count to all jobs. Some jobs work wonders with maximum threadcount while others OOM since increased threadpool count correlates with increased memory consumption. It would be great if we can tune this on a per query basis, maybe something like:
Or something like that. Since python has GIL, I don't think there would be any conditions where multiple queries are running and may request different number of threads. In this case I think we can just set the
POLARS_MAX_THREADS
tonum_threads
during the execution ofcollect()
and read that value dynamically. Would this negatively impact performance?