Open jamessmith123456 opened 1 year ago
(1): Pandarallel basically doubles the amount of needed memory, as stated in the documentation:
pandarallel gets around this limitation by using all cores of your computer. But, in return, pandarallel need twice the memory that standard pandas operation would normally use.
(2): No, the original data will be copied only once, whatever the parallelism.
(3): There is no coordination relationship between CPU and memory (cf (2))
(1): Pandarallel basically doubles the amount of needed memory, as stated in the documentation:
pandarallel gets around this limitation by using all cores of your computer. But, in return, pandarallel need twice the memory that standard pandas operation would normally use.
(2): No, the original data will be copied only once, whatever the parallelism.
(3): There is no coordination relationship between CPU and memory (cf (2))
hi @nalepae , if the amount of data is quite large, how can we boost the preparation before apply()
?
If I have 100GB data read in memory, I have to wait a long time before the apply
start
Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.
@SysuJayce, what do you mean by "boosting the preparation"?
If you are memory-bound, I would suggest breaking up your dataframe into smaller shards and applying your function to each shard.
Do you have any other problems? If not, I would like to close this issue.
(1)It seems that memory issues cannot be solved when there is a large amount of data. (2)If the parallelism is 20, the original data will be copied in 20 copies? (3)How can I solve the coordination relationship between memory and CPU to set the optimal parameters,please?