When doing a multi-worker query on files, we distribute the files among the workers. There is a chance that a worker does not receive a file (ex. if there aren't enough .csv files), which requires us to send over an empty data frame. The current logic for sending over an empty dataframe needs work, we should avoid generating a message and using send(...).
Possible solutions
Send the file paths to the workers that we can, then call broadcast() once if we have leftover works that need empty DFs.
https://github.com/rapidsai/node/blob/main/modules/sql/src/cluster.ts#L205-L231
When doing a multi-worker query on files, we distribute the files among the workers. There is a chance that a worker does not receive a file (ex. if there aren't enough
.csv
files), which requires us to send over an empty data frame. The current logic for sending over an empty dataframe needs work, we should avoid generating a message and usingsend(...)
.Possible solutions
broadcast()
once if we have leftover works that need empty DFs.