Closed 3cp closed 4 years ago
Current plan:
Node.js 10 supports named capture groups. Node.js 8 becomes end-of-life at the end of December 2019. I'd like to develop the next version, which will use named capture groups, unicode and lookbehind, of js-tokens during this autumn, and then release it early 2020. This way I can batch all breaking changes together.
Note to self – links on handling regex vs division better (but not 100%):
I like you switched from pure regex to code+regex. It's not only more flexible, but also more readable. I guess it can also support older nodejs/browser since it relies much less on regex features.
Thanks!
I mostly switched to code+regex since I realized that lookbehinds didn’t help much after all. And then I also got infinite template interpolations for free.
I still use Unicode property escapes in the regex (/\p{L}/u), which requires Node.js 10 (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Browser_compatibility), so Node.js will still be the lowest supported version.
I hope to finish this this weekend.
Unfortunately it doesn't work in Firefox. Babel has a plugin to transpile it to es5, https://mothereff.in/regexpu#input=var+regex+%3D+%2F%5Cp%7BL%7D%2Fu%3B&unicodePropertyEscape=1
It is very verbose.
I’ve never tried to support browsers with this module. Are you using it in browsers? If so, in what kind of project?
I got a bundler can work in browser, also an online IDE runs bundler in browser right now. https://github.com/dumberjs https://github.com/dumberjs/dumber-gist
The tokenisation is for https://github.com/dumberjs/modify-code (which dumber uses). Right now I use babel parser to tokenise code, but it's big and slow. I cannot use current version of js-tokens because of the regex/divider issue.
Not a big issue, I can transpile (or copy-and-modify) your module and embed it in modify-code dist file through rollup I think.
Note that the new version won’t be correct in all cases either. For example, it will think that if (x) /a/g.exec(s)
is division when in fact it is a regex.
Just out of curiosity, I’ve never heard anyone have trouble (in real life, I know it’s far from perfect) with the regex/division issue so far before. Do you have an example where it causes problems for you?
What are your requirements? Need a (near) standards compliant tokenizer, but something like esprima is too large/slow?
The code of this package doesn’t change very often so, yeah, I think you could easily fork it for your needs. Feel free to share if you do!
It's sad to know the new version still has unhandled cases. :-(
But this might not be an issue for my use case, because it's not missing any token, only creates unneeded tokens.
I think you could fix the issue. That /
is in context of beginning of a statement. Divider cannot start a statement I guess.
I tried js-tokens before, but it gave a huge fake regex started by a divider. This is very common when deal with minified js code. The result is the fake regex missed tons of real tokens.
yes a (near) standards compliant tokenizer. I use the tokens to drive source-map. I used few parsers before, esprima is not actively updated to follow current js spec. I encourtered cherow bug and now it's deprecated. Both babel parser and typescript parser can understand all syntax (including ts and jsx which are not in spec). Typescript parser is even bigger, so I settled on babel parser. Evaluating meriyah now, but that means to drop ts syntax support. The good part of js-tokens: even it doesn't support ts/jsx, it will generate tokens just fine.
fork is easy. But understand and fix all the edge cases are hard 😄
BTW, I am surprised @babel/highlight
uses your module instead of babel's own parser, and it indeed gives wrong color.
const highlight = require("@babel/highlight").default;
const code = `var p /= 2/g;`;
const result = highlight(code);
console.log(result);
I got your point on if (x) /a/g.exec(s)
now, because js-tokens is not a full parser, it's hard to understand the context. But a full parser will be much bigger than js-tokens.
Thank you very much for all the detailed feedback! I’ll keep that in mind while finishing the new version.
In case of babel, they use @babel/highlight
to colorize broken code that caused parse errors. Babel’s tokenizer can throw errors, while js-tokens never throws errors for any string input. They’d have a bit of a catch-22 situation otherwise. Anyway, var p /= 2/g
should be tokenized correctly by the new version.
The latest version of the func branch now supports if (x) /a/g.exec(s)
and all other edge cases I’ve been able to think of so far!
Suggest to add few test fixtures for jsx and ts code. Although js-tokens did not support them deliberately, they should work as expected.
Good idea, I’ll add some jsx and ts fixtures.
In other news, I’ve started adding a test262 parser test. If @babel/parser is bug free, js-tokens only mixes up division and regex in one case:
switch(a) { case 1: {}
/foo/ }
That’s supposed to be regex, but js-tokens thinks it’s division. If it’s easy enough to fix I’ll do it, otherwise it’s going to be a known limitation.
Edit: Note to self:
label:{}/a/g
switch(x){
case 1:{}/a/g
}
CoffeeScript supports babel https://coffeescript.org/#transpilation
How about, besides the /index.js
, create another /browser.js
file transpiled by babel to support Firefox and other browsers?
Usage in nodejs environment will be unchanged, but various js bundlers will honour "browser"
field in package.json, /browser.js
will be used in browser environment.
"files": [
"index.js",
"browser.js"
],
"main": "index.js",
"browser": "browser.js",
If this makes sense to you, I can help on the setup after you released next major version.
That might be a good idea! But before doing so I have a question:
I’ve not been super into the frontend compilation space the last half year or so, but before that I know that webpack and babel worked on being able to babel on code from node_modules, so that the ones who need it can transpile to the level of support wanted. Is that a thing now? Can you set up your build to transpile either all of node_modules or just js-tokens?
By default, bundlers skip transpiling on node_modules. But yes, there is way to transpiling node_modules (all or some) in bundler config.
However, this is not easy for end users. Besides the manual configuration, there is also kind of unpleasant troubleshooting first that users need to go through to find out what 3rd-party package breaks in browser and what level of transpiling is needed to fix that.
So it's better to ship pre-compiled dist files, so it just works in both nodejs and browser env. It looks quite easy to setup based on what I read from CoffeeScript doc.
But the typical user does not need js-tokens in their builds, right? It’s just a couple of pro users like you?
Also, if we do add a ES5 browser version – what if someone actually wants the modern and much smaller version, because their app only supports modern browsers anyway? Is that still possible, or is that causing annoying manual configuration and troubleshooting?
That is very easy once you shipped multi dist files.
In bundler config, user can force a main file for a npm package. This is much easier than setup babel transpiling on a npm package.
First, multi dist files erased manual config for at least 90% users. For the rest pro users, multi dist files are easier for manual config than figuring out transpiling.
I’ve decided not to include a version run through Babel, because my testing shows that it would increase the package size by 60%. Since this package is downloaded more than 20 million time per week (and growing), that would amount to around 130 GB per week of wasted bandwidth.
In other news, I’ve finished test262 and added support for JSX via { jsx: true }
(in the jsx
branch). What’s left to do is adding more JSX tests and then look everything through again.
v6.0.0 has now been released! Hope it’ll be useful for you.
Thx! will test it.
Unicode property escapes is coming in Firefox 78! https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in-spidermonkey/
Cheers!
Following the documented Limitations on "Division and regex literals collision", since current oldest Nodejs LTS v8 has Lookbehind assertions in regex, please update the implementation if possible. Thanks!