Closed thomashoneyman closed 1 year ago
@f-f feel free to chime in if you have preferences around how this gets implemented
Any of these sound great! I'm not too concerned about this since it's hopefully a quite temporary solution until we integrate Pursuit
I volunteer to work on this! Looks like there's enough context in the issue body to get started, but I'll ask questions if I get stuck.
Thanks! Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the foreign
workspace and the relevant tests there as well — to avoid bringing in more dependencies.
I think you'll want to wait on #643 before actually using gzip in the main application. But we could get the binding there in the meantime.
@thomashoneyman Thanks for the context. In that I case I'll just start on #643 instead.
This is now unblocked!
@thomashoneyman Hey, I initially volunteered on the assumption I'd be plugging in Jordan's existing gzip library. If this involves writing my own lighter weight gzip function + testing it, that's a bit outside my comfort zone.
I'm going to withdraw and let someone else tackle this.
Hi @CharlesTaylor7 the library seemed to have moved perhaps you can use this function https://pursuit.purescript.org/packages/purescript-node-zlib/0.4.0/docs/Node.Zlib#v:createGzip
Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the
foreign
workspace and the relevant tests there as well — to avoid bringing in more dependencies.I'm going to withdraw and let someone else tackle this.
Hey, I could pick this up if it's still relevant!
I'm just familiarizing myself with the node:zlib
library, but it seems like we could just FFI zlib.gzip
instead of zlib.createGzip
which might simplify things a little bit. If I'm understanding correctly, I think createGzip
would be useful if we didn't have the payload
in memory or wanted to send a streaming request body over the network, but those don't seem to be requirements here. zlib.gzip
on the other hand seems like it would be for just gzip
-ing an in-memory string into a buffer asynchronously (so gzip :: String -> Aff Buffer
could be the signature of our FFI wrapper). Let me know if I'm mistaken!
That sounds about right to me! I have little experience with it, but from some light reading I think we can just gzip a string to a buffer and send that along.
I would be happy to test some payloads sent to Pursuit as part of reviewing your work
Opened a PR here https://github.com/purescript/registry-dev/pull/659. It's made a bit messy by the fact that Buffer
can't be passed directly as fetch
body, open to suggestions about a better approach there 🙂
The registry publishes packages to Pursuit by sending the JSON output of the compiler's
purs publish
command in the body of a POST request:https://github.com/purescript/registry-dev/blob/64e3a6c444ba5a82d61cd0522f1300e5d719992a/app/src/App/Effect/Pursuit.purs#L59-L78
We're consistently seeing 502 error codes from Pursuit for some packages on upload. These are really 413 Content Too Large errors (they're getting garbled on their way through nginx). Pursuit won't accept package uploads larger than 1.5mb. See, for example:
However, I've noticed that some packages fail when being published via the registry, but succeed when published via Pulp. The difference: Pulp sends a gzipped response to Pursuit, and we don't.
First Pulp gzips the JSON from
purs publish
https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L68
https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L170-L171
Then it passes that to the publish docs function:
https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L85
https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L326-L327
In contrast, we're just sending the JSON directly. We should gzip the JSON as well.
Implementation
We have a few options for the implementation. To actually gzip the data we could:
Adapt the Pulp implementation This probably isn't our best bet because Pulp relies on a number of NPM libraries we don't want to bring in (
concat-stream
, etc.). I'd like to avoid pulling new JS dependencies unless absolutely necessary. But we can at least take a look at what they're doing.Use purescript-node-zlib by @JordanMartinez for the
createGzip
function, or write our own pure JS gzip function using the underlying Node code. This is a recent library Jordan published which gives us access to the zlib Node module (same as is used by Pulp) and has functions like createGzip already available. We could potentially write some glue around this to gzip the input JSON. Or we could write a thin FFI around createGzip that just gzips an input string.Use the
gzip
CLI We have access to thegzip
CLI via Nix; this is already used, for example, in the implementation of theRegistry.App.CLI.Tar
module. We could potentially introduce aRegistry.App.CLI.Gzip
module that exposes agzipCLI
function and agzip :: String -> Aff String Buffer
function that uses gzip with the"-"
argument to read/write over stdin/stdout.Once we have the gzipped data we can send it with an appropriate Content-Encoding header in the Affjax request. (Presumably we'd send it as a
RequestBody.string
orRequestBody.blob
— not sure which we should use.)