tobozo / ESP32-targz

🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266
Other
123 stars 16 forks source link

Stream to stream expansion #36

Open wimmatthijs opened 3 years ago

wimmatthijs commented 3 years ago

Hi,

pretty cool library here, kudoz and thanks. Could a stream to stream unpacking be envisioned? the main idea being unpacking an incoming httpstream which will be gzipped content, and immediately "consuming" that content. For example a json file, you just consume only the data you find for your specific keyword and the rest would be ditched. I would love to collaborate to implement this functionality if it is possible ? Let me know your thoughts, and i'm here if you would like to discuss

Wim

tobozo commented 3 years ago

hi, thanks for your feedback :+1:

Stream to stream is already supported (see tarGzExpander) only it does not handle filters based on filename, and can't stream to a JSON decoder.

However, based on your initial idea I've attempted to add '--exclude' and '--include' support for tar extraction and this is what I've come up with:

void TarUnpacker::setTarExcludePattern( tarExcludePattern cb );
void TarUnpacker::setTarIncludePattern( tarIncludePattern cb ):

provided you have your own custom filtering function:

bool myCustomExcludePatternMatcher( const char* filename )
{
  if( ! String( filename ).endsWith("my_keyword") ) return true; // will be excluded
  return false; // will be included
}

you can eventually ignore some files during unpacking

TARGZUnpacker->setTarExcludePattern( myCustomExcludePatternMatcher );

triggering a callback when a file has been updacked was already possible (e.g. to read json contents), see the example folder (test_tool.h) with a myTarMessageCallback implementation that reads file contents just after they've been untarred:

TARGZUnpacker->setTarMessageCallback( myCustomTriggerOnFileClose );

I've pushed the changes from this comment (untested though) on a specific branch if you want to play with that.

Now keep in mind you don't always control the order in which the files are added/extracted in the tar archive. If the json contents refers to another file in the same tar archive, this file may or may not already be unpacked, depending on many factors (path, name, modification date, arbitrary order).

wimmatthijs commented 3 years ago

Hi,

i'm sorry, i'm talking not about multiple files but about the gzipped response of a web-server. From most servers you can get a gzipped response, which is particularly convenient if an API is only programmed to give rather big JSON-responses.

So my aim is to receive the gzipped-info, unpack it and immediately and after that scan the contents of the unpacked for what i need.

a byte for byte stream would be ideal, but i'm not sure how gzip unzip works, does it reproduce the original byte by byte or chunks?

Wim

tobozo commented 3 years ago

oh right

byte-to-byte stream requires destination data to be seekable (dictionary is replaced by output data) so the destination can't be memory or a json parser, it must be a filesystem, although it only uses a few bytes of ram, it's very slow and will generate a lot of i/o so not recommended for using with http unless the app is very resilient and does not care about doing multiple attempts before it is successful

on the other hand using a dictionary can work but can fragment heap when used repeatedly

gzStreamUpdater could be a goot basis to implement that, using gzWriteCallback = &gzStreamWriteCallback; instead of the Updater methods.

wimmatthijs commented 3 years ago

ahah, this is a very good suggestion, i will have a look into that!

tobozo commented 3 years ago

hey @wimmatthijs, some after thoughts on this:

1) if the server sends a Content-Length header with the size of the gz file 2) if the ESP32 has at least 32kb heap free when decompression starts

.. then it becomes theoritically possible to do stream (HTTP) to stream (JSON) using ArduinoStreamReader deserialization interface from ArduinoJSON

I do not have a use case in mind for that though, could you point me out to an example sketch I could use as a basis to start testing this ?

wimmatthijs commented 3 years ago

hey @wimmatthijs, some after thoughts on this:

  1. if the server sends a Content-Length header with the size of the gz file
  2. if the ESP32 has at least 32kb heap free when decompression starts

.. then it becomes theoritically possible to do stream (HTTP) to stream (JSON) using ArduinoStreamReader deserialization interface from ArduinoJSON

I do not have a use case in mind for that though, could you point me out to an example sketch I could use as a basis to start testing this ?

Hey, cool, i have a very specific application in mind of course, could we maybe set up a short meeting to discuss? I'm not sure i'm allowed to discuss about the project on Github....

my email is wimmatthijs@gmail.com just drop me a line, thaks so much!

Wim

tobozo commented 3 years ago

sure, I've dropped you a message on hangouts/gmeet, otherwise I'm on gitter

https://gitter.im/tobozo

javipelopi commented 3 years ago

Hello again @tobozo!

In my application, I have a firmware.bin.gz splitted into multiple parts (gz001, gz002, etc) thatI get from the internet. Ideally, I would like to get them one by one and feed them to gzStreamUpdater without saving them to the filesystem.

Would that be possible? Could you give me a little direction on how should this be accomplished?

Thanks in advance!

tobozo commented 3 years ago

hey @javipelopi this is not possible with the current library.

Please make sure you create a new issue if you have another question, this thread is not about multipart.

javipelopi commented 3 years ago

@tobozo hi sure! I asked it here as I wanted to know if it was possible to decompress and consume the data on the fly for each part, or if there were preconditions that should be met for it (for example that each part would be divisible by a certain amount)

Anyway I will try to dig a little bit on my own!

Thanks!