uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
479 stars 53 forks source link

how to use pre-compiled binary that's available locally? #163

Closed gajus closed 1 year ago

gajus commented 1 year ago

I want to build node-re2 in a separate step from the rest of the setup, but the current instructions really make it obscure how to reference a pre-built binary.

This is how I am building it:

# re2 is taking a long time to compile
# by doing it here we avoid re-compiling re2 every time that pnpm-lock.yaml changes
FROM node:19-bullseye-slim AS re2-installer

WORKDIR /srv

RUN \
  apt update && \
  apt install -y python3 build-essential && \
  npm install --global pnpm@^8 turbo@^1 && \
  pnpm install re2 && \
  apt remove -y python3 build-essential && \
  pnpm store prune

FROM node:19-bullseye-slim

WORKDIR /srv

COPY --from=re2-installer /srv/node_modules/.pnpm/re2@1.18.0/node_modules/re2/build/Release/obj.target/re2.node .

How do I point to re2.node so that pnpm install does not attempt to compile it again?

gajus commented 1 year ago

Couldn't figure out. Ended up patching node-re2.

diff --git a/package.json b/package.json
index 16a3be6b9af20a0f876df7489c5d8b650f6c177a..fb3aac865b5aa4739eaa19ed51a63fcea749315b 100644
--- a/package.json
+++ b/package.json
@@ -24,7 +24,9 @@
     "test": "node tests/tests.js",
     "ts-test": "tsc",
     "save-to-github": "save-to-github-cache --artifact build/Release/re2.node",
-    "install": "install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || npm run rebuild",
+    "original-install": "install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || npm run rebuild",
+    "copy-binary": "mkdir -p build/Release && cp $RE2_BINARY_PATH build/Release/re2.node",
+    "install": "if [ -n \"$RE2_BINARY_PATH\" ]; then npm run copy-binary; else npm run original-install; fi",
     "verify-build": "node scripts/verify-build.js",
     "rebuild": "node-gyp rebuild"
   },
uhop commented 1 year ago

Does this work for you: https://github.com/uhop/node-re2/wiki/Precompiled-versions ?

uhop commented 1 year ago

Building local mirrors is an important business for security and other reasons. The previous doc has a link to build local mirrors. Just in case I am posting it here: https://github.com/uhop/install-artifact-from-github/wiki/Making-local-mirror

gajus commented 1 year ago

It would be nice if the patch I suggested was applied to node-re2.

This would allow people to re-use pre-compiled binaries in a Docker image without needing to use network.

uhop commented 1 year ago

This would allow people to re-use pre-compiled binaries in a Docker image without needing to use network.

It is accomplished by provisions in the links above.

AMoo-Miki commented 1 year ago

The install-from-cache script which is run as part of install can only fetch br and gz files and neither of those archive types can have multiple files in them.

The terms for redistributing this package in binary format require that the binary "reproduce[s] the ... copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution" but complying with that by including the LICENSE file along with re2.node would not be possible in a br or gz file.

What i was hoping to do was to build re2 for various node versions on different architectures and use them to install re2 as a dependency of my software. In my scenario, "distribution" is the act of making the URL available since their download happens by re2 itself. I wouldn't have a public landing page to show the license.

I am not a lawyer but to me it sounds like we have to enhance the install-from-cache script to handle tar.gz and maybe even tar.br, and that any redistributing of the binary that currently installs correctly is against the license. Do you read the license differently?

uhop commented 1 year ago

@AMoo-Miki It is a good point. TBH I explored initially using tar files with install-from-cache for practical needs: what if we want to include more than one generated file? It turned out that tar is a relatively complex format and so is zip. Both require non-trivial dependencies and/or a lot of code while install-from-cache is a no-dependency single file of about 250 lines.

So my reading is:

Having said that I do have some possible action points I am evaluating:

Now back to your specific scenario:

What i was hoping to do was to build re2 for various node versions on different architectures and use them to install re2 as a dependency of my software.

In my opinion, this is a proper allowable use of this facility. In fact, re2 has documented provisions for that to facilitate private mirrors, which are especially important for binaries.

Private mirrors are widely used in the industry. Major companies create curated mirrors of various code repositories, e.g., npm. They are required for security reasons to control the exact versioning of software, and its provenance; when the internet is not available, e.g., purposely off in the build environment, or only the intranet is available. Obviously, private mirrors should not be used to subvert the terms of the license but rather as the delivery means.

So I think you are good there.

uhop commented 1 year ago

Closing for now. Please reopen to continue or create a new ticket for new issues. If you feel it is more about discussing some RE2-related stuff, consider discussions.

gajus commented 1 year ago

This was closed without adding the suggested patch, i.e. it remains unresolved.

uhop commented 1 year ago

It was resolved without needing a patch. See unanswered (unreviewed?) comments:

PS: Patches for this project are submitted using PRs. And they should work on all supported platforms. Did you try your patch on plain Windows?

gajus commented 12 months ago

It was resolved without needing a patch. See unanswered (unreviewed?) comments:

It was not. Using URL to download is not desirable for offline systems. There is no easy way to cache externally fetched resources. The patch provides a way to fetch locally saved build.

Still a problem for libraries like https://github.com/microlinkhq/metascraper/issues/630