rapidsai / node

GPU-accelerated data science and visualization in node
https://rapidsai.github.io/node/
Apache License 2.0
187 stars 20 forks source link

[FIX] Handle sending of empty dataframes differently when using SQL file table creation #314

Closed matekdev closed 2 years ago

matekdev commented 3 years ago

https://github.com/rapidsai/node/blob/main/modules/sql/src/cluster.ts#L205-L231

When doing a multi-worker query on files, we distribute the files among the workers. There is a chance that a worker does not receive a file (ex. if there aren't enough .csv files), which requires us to send over an empty data frame. The current logic for sending over an empty dataframe needs work, we should avoid generating a message and using send(...).

Possible solutions

  1. Send the file paths to the workers that we can, then call broadcast() once if we have leftover works that need empty DFs.
  2. Message_id can be created using a random number