mjackson / unpkg

The CDN for everything on npm
https://unpkg.com
Other
2.99k stars 302 forks source link

RFC: Use /_metadata and /_module functions #185

Closed mjackson closed 5 years ago

mjackson commented 5 years ago

This is just a heads up for anyone who is currently using the ?meta or ?module query parameters on unpkg. Instead of using a query parameter, we are going to invoke this functionality through the /_metadata and /_module pathname prefixes going forward.

As unpkg grows, we are seeing increasing load at the origin servers from "cache busting" URLs that use always-unique query parameters to bypass Cloudflare's cache. The most common form of this is ?_=timestamp where timestamp is the current timestamp in milliseconds. This approach is used e.g. in jQuery to bypass HTTP caches.

By moving these query parameters into the pathname, we are able to ignore query strings altogether and cache based on pathnames. This should increase our cache hit rate and decrease load at the origin significantly.

The new functions are already live (5df576c6be377c15d9b0778b524be289a6f04b5e) but the names and/or implementation may change based on feedback in this issue.

WebReflection commented 5 years ago

FWIW this LGTM, I just need to find out all my live examples with ?module and do the switch, not a big deal. However ...

the names and/or implementation may change based on feedback in this issue.

I'm waiting for the OK2GO 'cause I don't want to track down all my examples if there are risks the names will change πŸ˜…

Thanks for the heads up πŸ‘‹

jamesknelson commented 5 years ago

This LGTM. I’ll switch Demoboard over as soon as the new pathnames are set in stone. πŸ‘

Rich-Harris commented 5 years ago

If you wanted to be really radical, module would be the default, and you'd have to specify /_nomodule/... πŸ˜€

That aside, I think this is the right move β€” meta is an entirely different resource, and module is arguably so as well. If it results in better uptime/performance for unpkg then it's a no-brainer. Then again I don't personally have any ?module URLs in the wild, so it's very easy for me to endorse a breaking change.

If you do stick with this change then I'd advocate for _module supporting .mjs resolution at the same time (I don't think ?module does?). Much as Michael Jackson Script has its detractors (I've been one of them), it is gaining traction among libraries, and there isn't really any other way to support poly-packages (i.e. /some-pkg/deep) that expose both CJS and ESM.

daKmoR commented 5 years ago

yeah I'm only using ?module urls... πŸ™ˆ

so just to be sure https://www.unpkg.com/test-wc-card@0.0.3/src/TestWcCard.js?module

will become https://www.unpkg.com/_module/test-wc-card@0.0.3/src/TestWcCard.js

for our use case making module the default and requiring _nomodule would be nice 😬

if it stays maybe we can make it a little more user friendly? of I now open https://www.unpkg.com/_module/test-wc-card@0.0.3/

mode module is available only for JavaScript and HTML files

also it seems something changed? as neither this https://www.unpkg.com/test-wc-card@0.0.3/demo/index.html?module or https://www.unpkg.com/_module/test-wc-card@0.0.3/demo/index.html works anymore πŸ€”

WebReflection commented 5 years ago

@Rich-Harris AFAIK .mjs is already supported (I pushed the change even if I'll never use it πŸ˜…)

I don't think it has to behave any different at all, should it?

mjackson commented 5 years ago

If you wanted to be really radical, module would be the default, and you'd have to specify /_nomodule

yeah I'm only using ?module urls

@Rich-Harris @daKmoR This makes a ton of sense. Thanks for the suggestion! The only people for whom this change would break are people who are actually doing the import transform themselves, which I doubt anyone is doing.

AFAIK .mjs is already supported

@WebReflection We don't currently support .mjs. If you have a library that publishes /lib/file.js and /lib/file.mjs and you do an import file from "lib/file?module" we will still serve the .js version.

it seems something changed?

@daKmoR I haven't changed any existing URLs yet, just a few old redirects that used to redirect to ?meta. You can see the changes in 5df576c6be377c15d9b0778b524be289a6f04b5e. If those files used to work, they should still work. I checked the Network tab in Chrome and everything seems to be loading ok. No 403s or 404s or anything like that.

gugadev commented 5 years ago

@mjackson Great improvement. Afaik, I think is a good opportunity for update babel-plugin-unpkg package. I was thinking about add an array of ignored expressions to avoid transform. Something like this:

const URL = require("whatwg-url")
const warning = require("warning")

const BareIdentifierFormat = /^((?:@[^\/]+\/)?[^\/]+)(\/.*)?$/

const unpkg = (dependencies = {}) => {
  return {
    visitor: {
---     "ImportDeclaration|ExportNamedDeclaration|ExportAllDeclaration"(path, { ignored }) {
+++     "ImportDeclaration|ExportNamedDeclaration|ExportAllDeclaration"(path, { ignored }) {
        if (
          !path.node.source || // Probably a variable declaration
          URL.parseURL(path.node.source.value) != null || // Valid URL
          path.node.source.value.substr(0, 2) === "//" || // URL w/o protocol
          [".", "/"].indexOf(path.node.source.value.charAt(0)) >= 0 // Local path
        )  {
          return // Leave it alone
        }

+++     const oneMatch = ignores.some(exp => { ... })

+++     if (oneMath) {Β return }

        // A "bare" identifier
        const match = BareIdentifierFormat.exec(path.node.source.value)
        const packageName = match[1]
        const file = match[2] || ""

        warning(
          dependencies[packageName],
          'Missing version for package "%s" in dependencies; falling back to "latest"',
          packageName
        )

        const version = dependencies[packageName] || "latest"

---     path.node.source.value = `https://unpkg.com/${packageName}@${version}${file}?module`
+++     path.node.source.value = `https://unpkg.com/_module/${packageName}@${version}${file}`
      }
    }
  }
}

module.exports = unpkg
Rich-Harris commented 5 years ago

The only people for whom this change would break

Up to a point, Lord Copper... it would also break for <script src="https://unpkg.com/foo">, no? That script would need type="module" for it to continue working.

WebReflection commented 5 years ago

We don't currently support .mjs. If you have a library that publishes /lib/file.js and /lib/file.mjs and you do an import file from "lib/file?module" we will still serve the .js version.

@mjackson accordingly with the direction node itself is taking, the moment one opts in as module, .js is module as much as .mjs and no magic resolution should happen.

If unpkg wants to put an automatic resolution based on extensions, then it's another story, but I do believe dual packages (at least all of mine) have separated folders for CJS vs ESM to simplify bundling too.

Anyway, as @Rich-Harris mentioned already, giving modules by default would break all pre-bundled libraries served through unpkg so it's a bit unpractical, 'cause the unpkg field in package.json files points already at what should be served by default when src="any-module" is specified without module explicit opt-in.

TL;DR I think serving ESM by default would break unpkg field and also make it useless.

mjackson commented 5 years ago

it would also break for <script src="https://unpkg.com/foo">, no? That script would need type="module" for it to continue working.

@Rich-Harris The ?module query param doesn't convert scripts into modules. It just transforms existing bare module id's in import and export statements. So if that script worked before w/out type=module (i.e. it is not an ES module and doesn't have any import or export statements), it will work equally as well after.

daKmoR commented 5 years ago

@daKmoR I haven't changed any existing URLs yet, just a few old redirects that used to redirect to ?meta. You can see the changes in 5df576c. If those files used to work, they should still work. I checked the Network tab in Chrome and everything seems to be loading ok. No 403s or 404s or anything like that.

@mjackson my deepest apologies, this works indeed just fine πŸ€— no clue what was wrong when I looked at it in the morning on my phone... I probably just had a bad connection or so πŸ™ˆ again sorry for the confusion πŸ˜…

Rich-Harris commented 5 years ago

@mjackson I'm referring to a situation in which someone has something like this:

<script src="https://unpkg.com/some-lib"></script>
<script>
  SomeLib.doTheThing();
</script>

Assume some-lib has both pkg.main and pkg.module (i.e. some-lib?module gets the ESM entry point). If the default behaviour were to serve modules, then that script tag would be loading a module as a script, and the page would break.

I'm arguing against my own interests here because I think it would be truly awesome if unpkg served modules by default β€” it would probably accelerate adoption by library authors more than just about anything else. But I think it would result in unavoidable breakage.

WebReflection commented 5 years ago

@Rich-Harris @mjackson once again, all my modules serves unpkg as pre-bundled, minified, globally exported, all browsers compatible JS files, and ESM via module field.

Serving ESM by default will break everything if the unpkg field is ignored, and it will cause confusion if ESM is served by default, making unpkg field redundant/useless.

edit

Example: hyperhtml vs hyperhtml?module but I have a consistent list of various polyfills behaving the same.

WebReflection commented 5 years ago

@mjackson honest proposal, why not https://esm.unpkg.com/module-name for ESM modules, using the module field in package.json, with https://cjs.unpkg.com/module-name using the main one, and https://min.unpkg.com/module-name to provide a pre-bundled-minified version of the module via unpkg, keeping the current query/non-query string situation untouched (but discouraged) ?

That would provide an easy to understand migration pattern to everyone, and nothing would break.

mjackson commented 5 years ago

Ahhhhh, I wasn't thinking about the automatic package.json resolution. I was only thinking about the code transform. You're right, the package.json resolution would break. Maybe these could be 2 separate things...

daKmoR commented 5 years ago

https://esm.unpkg.com/module-name

uh I like that :)

ggoodman commented 5 years ago

As suggested on twitter, could the query parameter problem be solved by stripping unknown / invalid parameters in a CloudFlare worker?

Alternatively could you imagine redirecting from the current query-param opt-ins to a new canonical URL? This could also be done in a Worker to optimize for cache hits.

mjackson commented 5 years ago

It would cost a lot of money to run all our requests through Cloudflare workers. I can redirect the query params to the new canonical URL for a time before turning off query params entirely to try to help ease the transition, but eventually the goal would be to disable query strings entirely.

WebReflection commented 5 years ago

@mjackson to further spec my proposal, you could also start accepting unpkg field, in package.json, as object, instead of string, to provide {"esm": "esm/index.js", "cjs": "cjs/index.js", "min": "bundle/min.js"} so that it's easy to point at sub-domains and it doesn't block you from adding more in the future (i.e. wasm or even custom stuff such jsx).

If the unpkg field is complicated due current usage as string, you can at least consider "securing" an unpkg.com field that will have one or more of those properties to bring in the right stuff.

edit forgot to mention that https://meta.unpkg.com/, or metadata, should also cover that use case.

ggoodman commented 5 years ago

I feel like this potential deprecation can be handled well on a forward looking basis, but I'm concerned by all the stuff out there baked into GitHub issues and SO answers that will be broken.

qubyte commented 5 years ago

I might be missing something... Is there an issue with the _module path prefix? Using

https://unpkg.com/mixomatic?module

gets me the ES6 module as expected, but

https://unpkg.com/_module/mixomatic

gets the UMD version.

mjackson commented 5 years ago

After sitting on this for a while, I don't think this is the way we should go. Although it's a pain to support URLs that intentionally try to bust the cache, that's just how the web works. Besides, I'd hate to never be able to have a query string on any of our URLs because of this one decision.

I'll probably be removing the /_metadata and /_module URLs sometime soon. Thanks for all your feedback here :)