hammerlab / ketrew

Keep Track of Experimental Workflows
http://www.hammerlab.org/docs/ketrew/master/index.html
Apache License 2.0
74 stars 10 forks source link

Allow server/client speak gzip especially as part of the `Submit_targets` protocol #543

Open armish opened 7 years ago

armish commented 7 years ago

This is often not a big issue, but when you are sequentially submitting more than 10 complete epidisco workflows to a single server, sending them all do take some time. Especially if/when you try to do that in parallel (to different servers or worse, to the same one).

I originally thought that this was more about server's taking time to do the equivalance checks on its side before officially putting the `OK stamp on it; but while I was playing with ketrew JSONs the other day, I realize that a single patient's workflow serialized in JSON format was ~30MB! I then try to send this via curl and it is not that the transfer gets completed and the server waits, but it is actually transferring the file that makes the waiting long.

The obvious solution, of course, is to support the gzipped content delivery from/to server/client which would be a natural extension of the HTTPs-based API you have designed. And there is also this:

$ du -sh all-in-epidisco-workflow.json*
 29M    all-in-epidisco-workflow.json
720K    all-in-epidisco-workflow.json.gz

Poking around a bit, I gladly saw that Cohttp_lwt at least supports the header/response formats, so I think all we need is to get the logic of gzip/gzip-not into the pre-/post-serialization parts and we will have blazingly fast submission experiences from then on (unless of course we DDoS ketrew with all those decompression tasks, which by the way can be handled by another helper virtual machine in the container; but that is for another day :))

(Maybe you have already tried this and moved away, in that case, feel free to ignore this; but I would be curious about what went wrong there)

smondet commented 7 years ago

to reduce the stress on the check-equivalence + add to engine, I'll try 2 things: