purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
97 stars 80 forks source link

Gzip package documentation when sending to Pursuit #642

Closed thomashoneyman closed 1 year ago

thomashoneyman commented 1 year ago

The registry publishes packages to Pursuit by sending the JSON output of the compiler's purs publish command in the body of a POST request:

https://github.com/purescript/registry-dev/blob/64e3a6c444ba5a82d61cd0522f1300e5d719992a/app/src/App/Effect/Pursuit.purs#L59-L78

We're consistently seeing 502 error codes from Pursuit for some packages on upload. These are really 413 Content Too Large errors (they're getting garbled on their way through nginx). Pursuit won't accept package uploads larger than 1.5mb. See, for example:

However, I've noticed that some packages fail when being published via the registry, but succeed when published via Pulp. The difference: Pulp sends a gzipped response to Pursuit, and we don't.

First Pulp gzips the JSON from purs publish

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L68

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L170-L171

Then it passes that to the publish docs function:

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L85

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L326-L327

In contrast, we're just sending the JSON directly. We should gzip the JSON as well.

Implementation

We have a few options for the implementation. To actually gzip the data we could:

  1. Adapt the Pulp implementation This probably isn't our best bet because Pulp relies on a number of NPM libraries we don't want to bring in (concat-stream, etc.). I'd like to avoid pulling new JS dependencies unless absolutely necessary. But we can at least take a look at what they're doing.

  2. Use purescript-node-zlib by @JordanMartinez for the createGzip function, or write our own pure JS gzip function using the underlying Node code. This is a recent library Jordan published which gives us access to the zlib Node module (same as is used by Pulp) and has functions like createGzip already available. We could potentially write some glue around this to gzip the input JSON. Or we could write a thin FFI around createGzip that just gzips an input string.

  3. Use the gzip CLI We have access to the gzip CLI via Nix; this is already used, for example, in the implementation of the Registry.App.CLI.Tar module. We could potentially introduce a Registry.App.CLI.Gzip module that exposes a gzipCLI function and a gzip :: String -> Aff String Buffer function that uses gzip with the "-" argument to read/write over stdin/stdout.

Once we have the gzipped data we can send it with an appropriate Content-Encoding header in the Affjax request. (Presumably we'd send it as a RequestBody.string or RequestBody.blob — not sure which we should use.)

thomashoneyman commented 1 year ago

@f-f feel free to chime in if you have preferences around how this gets implemented

f-f commented 1 year ago

Any of these sound great! I'm not too concerned about this since it's hopefully a quite temporary solution until we integrate Pursuit

CharlesTaylor7 commented 1 year ago

I volunteer to work on this! Looks like there's enough context in the issue body to get started, but I'll ask questions if I get stuck.

thomashoneyman commented 1 year ago

Thanks! Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the foreign workspace and the relevant tests there as well — to avoid bringing in more dependencies.

I think you'll want to wait on #643 before actually using gzip in the main application. But we could get the binding there in the meantime.

CharlesTaylor7 commented 1 year ago

@thomashoneyman Thanks for the context. In that I case I'll just start on #643 instead.

thomashoneyman commented 1 year ago

This is now unblocked!

CharlesTaylor7 commented 1 year ago

@thomashoneyman Hey, I initially volunteered on the assumption I'd be plugging in Jordan's existing gzip library. If this involves writing my own lighter weight gzip function + testing it, that's a bit outside my comfort zone.

I'm going to withdraw and let someone else tackle this.

flip111 commented 1 year ago

Hi @CharlesTaylor7 the library seemed to have moved perhaps you can use this function https://pursuit.purescript.org/packages/purescript-node-zlib/0.4.0/docs/Node.Zlib#v:createGzip

pete-murphy commented 1 year ago

Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the foreign workspace and the relevant tests there as well — to avoid bringing in more dependencies.

I'm going to withdraw and let someone else tackle this.

Hey, I could pick this up if it's still relevant!

I'm just familiarizing myself with the node:zlib library, but it seems like we could just FFI zlib.gzip instead of zlib.createGzip which might simplify things a little bit. If I'm understanding correctly, I think createGzip would be useful if we didn't have the payload in memory or wanted to send a streaming request body over the network, but those don't seem to be requirements here. zlib.gzip on the other hand seems like it would be for just gzip-ing an in-memory string into a buffer asynchronously (so gzip :: String -> Aff Buffer could be the signature of our FFI wrapper). Let me know if I'm mistaken!

thomashoneyman commented 1 year ago

That sounds about right to me! I have little experience with it, but from some light reading I think we can just gzip a string to a buffer and send that along.

thomashoneyman commented 1 year ago

I would be happy to test some payloads sent to Pursuit as part of reviewing your work

pete-murphy commented 1 year ago

Opened a PR here https://github.com/purescript/registry-dev/pull/659. It's made a bit messy by the fact that Buffer can't be passed directly as fetch body, open to suggestions about a better approach there 🙂