nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.6k stars 28.64k forks source link

Data URLs truncated #53775

Open bojavou opened 2 weeks ago

bojavou commented 2 weeks ago

Version

v22.4.0

Platform

Linux server 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

loaders

What steps will reproduce the bug?

This snippet reproduces.

node --eval '' --import 'data:text/javascript,console.log("Whither wanders thy distempered mind?")'
SyntaxError: Invalid or unexpected token

This is the embedded data URL.

data:text/javascript,console.log("Whither wanders thy distempered mind?")

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior? Why is that the expected behavior?

By the spec, data URLs do not have a query string or fragment.

Characters ? # should be allowed in the URL body.


This file loads the same data URL successfully in a browser.

<!DOCTYPE html>
<html>
  <head>
    <script>
import('data:text/javascript,console.log("Whither wanders thy distempered mind?")')
    </script>
  </head>
</html>
Whither wanders thy distempered mind?

What do you see instead?

SyntaxError: Invalid or unexpected token

Additional information

There's a correct regex that extracts the data URL body.

https://github.com/nodejs/node/blob/b9289a6e29e54beeaa3f781bde1195e48df0da75/lib/internal/modules/esm/load.js#L29

But it's run on URL#pathname, which chops off query string and fragment.

https://github.com/nodejs/node/blob/b9289a6e29e54beeaa3f781bde1195e48df0da75/lib/internal/modules/esm/load.js#L44

new URL('data:,How far? Until ID #12345')
URL {
  href: 'data:,How far?%20Until%20ID%20#12345',
  origin: 'null',
  protocol: 'data:',
  username: '',
  password: '',
  host: '',
  hostname: '',
  port: '',
  pathname: ',How far',
  search: '?%20Until%20ID%20',
  searchParams: URLSearchParams { ' Until ID ' => '' },
  hash: '#12345'
}

This might be a quick fix:

const match = RegExpPrototypeExec(DATA_URL_PATTERN, url.pathname + url.search + url.hash);
targos commented 2 weeks ago

@nodejs/loaders

aduh95 commented 2 weeks ago

I'm probably saying things you already know about, but here is it anyway: a workaround is to encode the data (e.g. data:text/javascript,console.log(%22Whither wanders thy distempered mind%3F%22)) so there's no ambiguity on how to parse the URL.

Should URL be fixed to not report "fake" query string for data: URLs? I would expect it to parse URL strings as spec'd.

bojavou commented 2 weeks ago

I'm probably saying things you already know about, but here is it anyway: a workaround is to encode the data (e.g. data:text/javascript,console.log(%22Whither wanders thy distempered mind%3F%22)) so there's no ambiguity on how to parse the URL.

thumbsup

Also could base64 it.

Should URL be fixed to not report "fake" query string for data: URLs? I would expect it to parse URL strings as spec'd.

I know it, it does this simplified parsing. I've never really been happy with it.

targos commented 2 weeks ago

I opened https://github.com/nodejs/node/pull/53778 with a fix.