gskinner / regexr

RegExr is a HTML/JS based tool for creating, testing, and learning about Regular Expressions.
http://regexr.com/
GNU General Public License v3.0
9.86k stars 970 forks source link

Client side compilation of other flavours #273

Open zikaari opened 6 years ago

zikaari commented 6 years ago

With support for WebAssembly catching up pretty fast, oniguruma regex engine has been successfully ported to the web. And it's amazing.

WebAssembly port: NeekSandhu/onigasm

Here's a list of all the regex syntaxes it can handle in the browser itself:

At the moment, all other syntaxes are "locked" but really are just a flip of a switch away.

Hope this interests Regexr community 🙂

zikaari commented 6 years ago

Maybe extends https://github.com/gskinner/regexr/issues/32

gskinner commented 6 years ago

That's really interesting. In theory we could run it all in a worker as well. The wasm file is pretty large - about 600kb over the wire - or 163kb if our server is smart enough to compress it (I'm not sure if it is). It could completely get rid of the need to do server side execution though, and improve our ability to show live previews.

Adding syntaxes would require more than just flipping a switch, since we need our lexer to support them for syntax highlighting, and we need to document them properly.

Conceivably, we could start by shifting PCRE to this, and then adding other flavours as time permits (or as people contribute pull requests).

zikaari commented 6 years ago

Although atom/node-oniguruma repo seems pretty inactive, but just to be safe I have proposed "flipping a switch" thing over there.

https://github.com/atom/node-oniguruma/issues/81

If it doesn't get a reply, I'm gonna assume an implicit green light and move forward with this change.

gskinner commented 6 years ago

Is it easy / possible to strip down the wasm file at all by removing features that aren't needed? For example, flavours we don't immediately support, functionality we don't need (ex. not sure we'd need the Scanner). Just curious.

zikaari commented 6 years ago

Wasm file is pure 100% virgin* libonig.so, which is literally source files of original kkos/oniguruma repo run through C compiler.

We can't control what goes in the final binary unless someone is willing to fork kkos/oniguruma and trim that one down instead.

But in my opinion, it's not really worth the effort. WASM can load asynchronously and while it is loading, we can have server driven regexing as usual. Once loaded, we switch to onigasm for subsequent operations.

* libonig.so + some glue code

gskinner commented 6 years ago

I figured that was the case, but it never hurts to ask. We'd probably just toast the server side solving if we implemented this. The js for the whole app is about the same size, so we could just load the wasm when the user changes flavours, and keep it loaded. The first solve will be a bit slow, but shouldn't be too bad.

zikaari commented 6 years ago

Just noticed this:

... or 163kb if our server is smart enough to compress it (I'm not sure if it is)

If Regexr server doesn't support gzip, you can pull wasm from jsDelivr CDN (159KB)

Pseudo implementaion


import { loadWASM, OnigRegExp, OnigString } from 'onigasm'

(async () => { await loadWASM('https://cdn.jsdelivr.net/npm/onigasm@2.2.1/lib/onigasm.wasm')

let text = new OnigString(textInput.getValue())
let regexp = new OnigRegExp(regexpInput.getValue(), { syntax: 'perl' })

const updateHighlights = () => {
    const match = regexp.searchSync(text)
    console.log(match)
    /*
    [
        {index: 0, start: 1, end: 4, match: 'abc', length: 3}, // entire match
        {index: 1, start: 2, end: 3, match: 'b', length: 1}    // first capture group
    ]
    */
}

regexpInput.on('change', () => {
    regexp = new OnigRegExp(regexpInput.getValue(), { syntax: 'perl' })
    updateHighlights()
})
textInput.on('change', () => {
    text = new OnigString(textInput.getValue())
    updateHighlights()
})

})()



Regarding,
> ... We'd probably just toast the server side solving if we implemented this

I think you might wanna leave that in for another 2-3 years, until `WebAssembly` supporting browsers become mainstream and you decide to no longer support older browsers after certain point.
zikaari commented 6 years ago

Queued up https://github.com/NeekSandhu/onigasm/pull/14

Eitz commented 6 years ago

Any news on this, @NeekSandhu / @gskinner ?

gskinner commented 6 years ago

Not yet. I have one more update in the queue before I look at this in more detail.

gskinner commented 6 years ago

We just pushed v3.5, and I plan to evaluate this for v3.6

gskinner commented 5 years ago

Quick update: This is on my list to look at over the next little bit.

goyalyashpal commented 1 year ago

hi! just was curious if there are any updates on this. thanks.