onelson / jq-rs

Rust crate to provide programmatic access to `jq`.
89 stars 9 forks source link

[question] can jq-rs be made streaming? #29

Open hansbogert opened 4 years ago

hansbogert commented 4 years ago

Jq allows for streaming processing, i.e, constant memory usage for querying (if I'm not mistaken)

Does jq-rs also allow that?

onelson commented 4 years ago

Great question.

This isn't currently offered, but I can look into it!

I'll have to check to see what jq does specifically for this, but I suspect we might be able to build out "streaming" from a custom iterator that leverages jq_rs::compile().

I can't speak to how the memory profile will look with the current implementation (using jq_rs::compile()) since I got it working in an improvised fashion. The intent is that memory should be flat, with the fluctuations being from the allocations for the inputs and outputs from running the jq program, but I haven't yet profiled to ensure this is the case. Valgrind said the implementation was leak free, but that's as far as I went to audit how it behaves.

This isn't really what you were asking, but as far as "streaming" in the async/await sense goes - that's another story.

A tricky thing is that I don't really have any guarantees from jq about thread-safety so async/concurrent patterns with jq-rs usually mean putting your jq program in a thread alone, then passing messages in/out with channels (or some similar tactic) if you want to combine it with other async apis.

hansbogert commented 4 years ago

another elaborate answer, thank you.

Though I understood your answer up to

This isn't really what you were asking, but as far as "streaming" in the async/await sense goes - that's another story.

Streaming does not inherently mean async/await right? Or do you mean if you want to have streaming with jq, you have to communicate the stream items from libjq -> jq-rs as you described in your last paragraph?

onelson commented 4 years ago

Correct. The futures support in the std lib offer a Stream trait for consuming values with async/await, but support for the incremental consumption of input in jq is not really related to this. I was disambiguating for myself ;)

I went back to look at what the streaming support in jq actually offers (I haven't used it before). Essentially, it's just reading the input line by line (I think?) and returning values with ... I'm not actually sure I understand how this works just yet. It looks like each value emitted comes with some sort of key to indicate the portion of the input that generated it, but I found it confusing.

I'll need more study to learn about this feature, at which point I can decide if I want to pursue exposing the implementation jq offers as-is, or if a more simple iterator-based solution is just as fair. I could see both being useful.