haskell-works / avro

Haskell Avro Encoding and Decoding Native Support (no RPC)
BSD 3-Clause "New" or "Revised" License
83 stars 35 forks source link

API for writing containers incrementally #169

Open alexbiehl opened 3 years ago

alexbiehl commented 3 years ago

I am constructing huge AVRO containers that I have to upload in multiple chunks. To help the use case It would be great if we had an API that let's you write containers incrementally:

packContainer
  :: (ToAvro a) 
  => Codec
  -> Schema -- ^ Writer schema
  -> ByteString -- ^ Sync bytes
  -> (Builder -- ^ Container header
       , [a] -> Builder -- ^ A function to feed a's and turn them into valid container blocks
       )

This allows to consume from a streaming source and fill buffers incrementally.

I can contribute if you think it'd be worthwhile.

AlexeyRaga commented 3 years ago

@alexbiehl cannot this one be used?

encodeContainerWithSync :: ToAvro a => Codec -> Schema -> BL.ByteString -> [[a]] -> BL.ByteString

Given that lists in Haskell are lazy, you can lazily produce a lazy bytestring and lazily upload it in chunks...

At least this was the way we've done it in the past :)

AlexeyRaga commented 3 years ago

Sorry, accidentally clicked the wrong button

alexbiehl commented 3 years ago

Unfortunately I don’t have a lazy list, my messages are coming from a Kafka topic.

AlexeyRaga commented 2 years ago

@alexbiehl Ah, we've been using lazy IO for it (both with conduit and with bare unsafeInterleaveIO to get these lazy lists. Yeah, for sure, that's be a valuable addition for people who don't want to mess with lazy IO!