pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.57k stars 1.98k forks source link

LazyFrame sort in batch only #19921

Open herrmann1981 opened 6 days ago

herrmann1981 commented 6 days ago

Description

We want to analyze / process log messages from a hardware logger. Unfortunately the logs are quite big and due to the logger, some messages might come out of order (regarding the time). Now we would need to sort the messages after time. But since the logs are quite big we can not collect the whole dataframe. We would rather sort based on time within the individual batch of messages in the LazyFrame. At least for us this would be sufficient since messages are only slightly out of order. So a message from the start might not appear at the far end of the file.

image

Is this somehow possible?