Use regular expressions instead of state machine

OlsonDev commented 9 years ago

I set up a benchmark to compare observe-js's parsePath(path) performance with lodash's toPath(value). It turns out the state machine is 74-85% slower than regular expressions. The caveat is lodash treats "foo.bar space.baz" differently than observe-js; lodash is forgiving whereas observe-js returns an invalid path object. I'm guessing the regex pattern could be modified to be more strict if desirable.

Another thing is observe-js has caching... that could be added to a lodash-esque implementation.

My opinion is it'd be worth it to reduce code complexity and boost performance on pages with many PathObservers. Thoughts?

jmesserly commented 9 years ago

Seems reasonable to me. Given that these paths are typical quite short, regex seems a good fit.

OlsonDev commented 9 years ago

I got around to writing a regex/modified function that's strict enough, I think. It performs reasonably well.

As expected, the regex is a bit more unwieldy than lodash's but still comprehensible (ya know... for those that regularly read regular expressions ... really wish JS had a verbose flag/syntax):

var rePropNameLodash = /[^.[\]]+|\[(?:(-?\d+(?:\.\d+)?)|(["'])((?:(?!\2)[^\n\\]|\\.)*?)\2)\]/g;
var rePropNameStrict = /(?:^\s*|\s*[.\]]\s*|\s*(?=\[))([^.[\]"' ]+|\[\s*([-+]?\d+(?:\.\d+)?\s*)]|\[\s*(?:"((?:[^"]|\\")+)"|'((?:[^']|\\')+)')\s*])(?=\s*.|\[|$)(?:\s*$)?/g;

Using the strict version, this works as expected:

Path.get('foo. bartaco . baz [42] [+1.2] .a[-9.4]   .  taco [ " 1 \\\"\'" ].bacon')
=> ["foo", "bartaco", "baz", "42", "+1.2", "a", "-9.4", "taco", " 1 \"'", "bacon"]

Whereas this returns an invalid path (note the extra double quote):

Path.get('foo. bartaco . baz [42] [+1.2] .a[-9.4]   .  taco [ " 1 \\\""\'" ].bacon')
=> []

It may violate SRP but I could mash parsePath and compiledGetValueFromFn together so the structure doesn't have to be traversed twice -- another benefit being that the way the regex is composed, as it's gathering parts, I know which syntax was used (as I'm sure the state machine did too) so building the compiled function won't have all these branches checking & formatting: pathString += isIdent(key) ? '.' + key : formatAccessor(key);.

There's already issues with pathCache not understanding the equality of the paths foo.bar and foo["bar"] and I'd opt to continue that behavior: generally developers are consistent so there won't be /m?any/ "duplicate" paths in the cache.

While I'm building the path array, I effectively know the "parent" path that #97 is asking for.

Feedback I'm asking for before implementing this:

Should I use the strict regex?
Should I mash parsePath and compiledGetValueFromFn together?
Should I create/find from cache parent paths while parsing? ... Finding from cache would provide ==/=== equality but likely cost more performance when building the path. On the other hand, if someone had a lot of similar paths, creating new ones every time could chew up a fair amount of memory.

OlsonDev commented 9 years ago

Oh, and my regex may be more strict but it's certainly not spec compliant considering a valid identifier alone (not a path!) would require a 11,236 character long regular expression. :-)

googlearchive / observe-js

Use regular expressions instead of state machine #109