microsoft / tyger

Remote signal processing.
https://microsoft.github.io/tyger/
MIT License
22 stars 7 forks source link

Improving handling of a trickle of data into tyger buffer write #123

Closed yuliadub closed 1 month ago

yuliadub commented 3 months ago

Another attempt for trickle of data, this time with go routine:

This is a first draft for trickle of data write for tyger - instead of setting a block size for writing buffers, the user would instead provide a time-window on how often the writes should happen (time-window would define that writes should happen every X seconds) (https://github.com/microsoft/tyger/issues/93)

Main thing I ran into is that read (both in io or bufio) wants you to specify the exact buffer size to read into vs being able to just read what is current present in io.Reader. If using ReadAll, the reader will wait for an EOF before starting to read. I couldn’t find a way to see the current size of io.Reader without actually reading the data to begin, which also wait for an EOF before returning the size.

The approach I choose was to keep reading single bytes of data from io.Reader and append them to an existing buffer[] until the defined time window is over. Then take what was read and write that to a buffer same as with block size. Repeat until EOF.

I added the simple script I used for testing just to have an example - will remove it once this PR is no longer a draft.

What this pr doesn’t do yet but should:

johnstairs commented 2 months ago

If you do something like

tyger buffer gen 1G | tyger buffer write $(tyger buffer create) --flush-interval 10s

You end up with blobs that are 64KiB, whereas they should be 4MiB.

yuliadub commented 1 month ago

going to close for now - apologies I have not had time to complete this. I am hoping to get more time to work on it once I am back from vacation in 2 weeks if it would still be relevant.