Open jlorieau opened 3 weeks ago
A data streaming library would allow more efficient processing of large data files, and it would allow the stream of live datasets.
Many possible libraries exist with their own advantages and disadvantages:
Pathway.
pw.run
Joblib
Smartopen
open
Apache beam
Pyspark Datasources (files)
Data streams
A data streaming library would allow more efficient processing of large data files, and it would allow the stream of live datasets.
Many possible libraries exist with their own advantages and disadvantages:
Pathway.
pw.run
)Joblib
Smartopen
open
alternative with built-in smart file streaming.Apache beam