microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

feat: be possible to skip re2 #656

Closed Kikobeats closed 1 year ago

Kikobeats commented 1 year ago

closes #630

gajus commented 1 year ago

Amazing! RE2 is causing so many issues in our setup. Good riddance.

gajus commented 1 year ago

@Kikobeats It is still here https://www.npmjs.com/package/@metascraper/helpers?activeTab=code

gajus commented 1 year ago

In case it is not clear what I mean, by being a direct dependency of that package, it still gets pulled in:

dependencies:
@contra/integrations link:../integrations
├─┬ metascraper 5.37.1
│ └─┬ @metascraper/helpers 5.37.1
│   ├── re2 1.20.3
│   └─┬ url-regex-safe 4.0.0
│     └── re2 1.20.3 peer
├─┬ metascraper-description 5.37.1
│ └─┬ @metascraper/helpers 5.37.1
│   ├── re2 1.20.3
│   └─┬ url-regex-safe 4.0.0
│     └── re2 1.20.3 peer
├─┬ metascraper-image 5.37.1
│ └─┬ @metascraper/helpers 5.37.1
│   ├── re2 1.20.3
│   └─┬ url-regex-safe 4.0.0
│     └── re2 1.20.3 peer
└─┬ metascraper-title 5.37.1
  └─┬ @metascraper/helpers 5.37.1
    ├── re2 1.20.3
    └─┬ url-regex-safe 4.0.0
      └── re2 1.20.3 peer
Kikobeats commented 1 year ago

That's intentional; you can skip it from execution time, but it's always present during installation.

re2 exists for a reason, and I'm not going to lower the security of this library and remove it, sorry. There are other mechanisms to avoid installing it in case you don't want it explicitly you don't want.

I am happy to accept a PR improving this workflow 🙂

gajus commented 1 year ago

re2 exists for a reason, and I'm not going to lower the security of this library and remove it, sorry. There are other mechanisms to avoid installing it in case you don't want it explicitly you don't want.

Such as?

gajus commented 1 year ago

@Kikobeats Just wanted to see if you have input here. I am having with workarounds, just not familiar with such.

Kikobeats commented 1 year ago

With pnpm.io/package_json#pnpmoverrides you have granular control for whatever dependency 🙂

gajus commented 1 year ago

I don't follow.

overrides will only allow me to change version or resolve re2 to another dependency. There is no way to remove it.

gajus commented 1 year ago

Just realized I can pnpm patch re2 itself, remove the install action, and replace module.exports with RegExp.

Hopefully this will save time to others:

diff --git a/package.json b/package.json
index 16a3be6b9af20a0f876df7489c5d8b650f6c177a..a9946f3f97ba4c0d165dceee668aae53d8adf568 100644
--- a/package.json
+++ b/package.json
@@ -24,7 +24,8 @@
     "test": "node tests/tests.js",
     "ts-test": "tsc",
     "save-to-github": "save-to-github-cache --artifact build/Release/re2.node",
-    "install": "install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || npm run rebuild",
+    "original-install": "install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || npm run rebuild",
+    "install": "echo 'Skipping RE2 install'",
     "verify-build": "node scripts/verify-build.js",
     "rebuild": "node-gyp rebuild"
   },
diff --git a/re2.js b/re2.js
index b6b1eb7002ef5df623190c0d8ac21e83490a1370..c3bfda18657119c36fc5de301869804bdbfe7820 100644
--- a/re2.js
+++ b/re2.js
@@ -1,37 +1,3 @@
 'use strict';

-const RE2 = require('./build/Release/re2');
-
-if (typeof Symbol != 'undefined') {
-  Symbol.match &&
-    (RE2.prototype[Symbol.match] = function (str) {
-      return this.match(str);
-    });
-  Symbol.search &&
-    (RE2.prototype[Symbol.search] = function (str) {
-      return this.search(str);
-    });
-  Symbol.replace &&
-    (RE2.prototype[Symbol.replace] = function (str, repl) {
-      return this.replace(str, repl);
-    });
-  Symbol.split &&
-    (RE2.prototype[Symbol.split] = function (str, limit) {
-      return this.split(str, limit);
-    });
-  Symbol.matchAll &&
-    (RE2.prototype[Symbol.matchAll] = function* (str) {
-      if (!this.global) {
-        throw TypeError('String.prototype.matchAll called with a non-global RE2 argument');
-      }
-      const re = new RE2(this);
-      re.lastIndex = this.lastIndex;
-      for (;;) {
-        const result = re.exec(str);
-        if (!result) break;
-        yield result;
-      }
-    });
-}
-
-module.exports = RE2;
+module.exports = RegExp;
aldenquimby commented 5 months ago

@gajus have you published a version of this package anywhere with re2 removed? We are also running into build failures and are planning to use open-graph-scraper instead. I'd prefer to use metascraper