mhuebert / instaparse-live

Build parsers in your browser using instaparse-cljs (ClojureScript/Reagent)
http://instaparse-live.matt.is
43 stars 2 forks source link

Cannot display more than 32 results #8

Closed mhuebert closed 7 years ago

mhuebert commented 9 years ago

This one is fun. We currently cannot visualize a result if it contains more than 32 elements!

The error happens in clojure.walk/postwalk, when it tries to conj an AutoFlattenSeq:

Error: No protocol method ICollection.-conj defined for type instaparse.auto-flatten-seq/AutoFlattenSeq: [object Object]
    at Object.cljs$core$missing_protocol [as missing_protocol] (http://localhost:3449/js/compiled/out/cljs/core.js:239:9)
    at http://localhost:3449/js/compiled/out/cljs/core.js:1364:17
    at cljs$core$_conj (http://localhost:3449/js/compiled/out/cljs/core.js:1367:3)
    at http://localhost:3449/js/compiled/out/cljs/core.js:8333:96
    at Function.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 (http://localhost:3449/js/compiled/out/cljs/core.js:8334:3)
    at cljs.core.LazySeq.cljs$core$IReduce$_reduce$arity$3 (http://localhost:3449/js/compiled/out/cljs/core.js:11057:29)
    at Function.cljs.core.reduce.cljs$core$IFn$_invoke$arity$3 (http://localhost:3449/js/compiled/out/cljs/core.js:8438:13)
    at Function.cljs.core.into.cljs$core$IFn$_invoke$arity$2 (http://localhost:3449/js/compiled/out/cljs/core.js:17733:25)
    at cljs$core$into (http://localhost:3449/js/compiled/out/cljs/core.js:17704:23)
    at clojure$walk$walk (http://localhost:3449/js/compiled/out/clojure/walk.js:15:39)
    at clojure$walk$postwalk (http://localhost:3449/js/compiled/out/clojure/walk.js:28:26)

Example: http://instaparse-live.matt.is:3449/#/-JsQLIv2bdcIYhzOj5kh

An AutoFlattenSeq is only created when the number of objects in a result exceeds the threshold of 32: https://github.com/lbradstreet/instaparse-cljs/blob/master/src/cljs/instaparse/auto_flatten_seq.cljs#L5

I can't say that I've understood all the code in that page yet, so if anyone can shed some light on what a good approach here might be that would be much appreciated. Use a different method for modifying the tree? Modify instaparse-cljs?

(cc/ @aengelberg, @lbradstreet).

aengelberg commented 9 years ago

Interesting. It looks like the clojure version works properly with conj, because it implements the cons method of IColl. Looks like the cljs version should be implementing ICollection which provides the conj hook via -conj.

A potential workaround would be to first call something like (prewalk #(if (afs? %) (seq %) %) result). That would turn all the auto-flatten-seq's into real seq's, and the postwalk would behave properly.

Engelberg commented 9 years ago

I think this is simply a mismatch between the name of Clojure's Java-based interfaces and Clojurescript's protocols for collections.

AutoFlattenSeq is a custom data structure to simulate the efficient concatenation of two vectors.

If we want to join v1 and v2 (an operation referred to in that file as conj-flat), then as long as v2 has 32 elements or fewer, we call (into v1 v2). If v2 has more than 32 elements, we call (conj v1 v2) and update the hash and count as if we had called (into v1 v2) in order to simulate that this is a flat sequence. Later, the first time you actually try to explore the elements of this concatenation, it flattens the sequence and caches it. Instaparse itself doesn't need to explore the individual elements of the parse, so this ensures that only the successful parses that are returned from instaparse and used will pay the penalty of flattening the sequence.

This is an important performance optimization, because internally, with some grammars, instaparse ends up doing a lot of concatenations of a small vector followed by a large vector. Many of these concatenations end up getting discarded and never become part of the final, correct parse.

In the Clojure version, conj is implemented with the cons method (strange naming). Sounds like that code needs to be renamed as an implementation of ICollection.-conj.

In the long run, if rrb-vector is now a mature library, it would be interesting to try replacing AutoFlattenSeq with rrb-vector.

lbradstreet commented 9 years ago

@mhuebert could you please try the latest snapshot? I have implemented -conj for ICollection. It may or may not fix your issue.

mhuebert commented 9 years ago

Thanks @Engelberg for the explanation and @lbradstreet for the snapshot update!

That implementation of ICollection.-conj is definitely on the right track. However, cons returns a seq instead of a vector. Is that intended? Instaparse-live currently assumes that the lists in the parse tree are vectors.

Calling vec in -conj does the trick for me:

ICollection
  (-conj [coll o] (vec (cons o coll)))

(Is there any reason why we would not want -conj to always return a vector?)

Engelberg commented 9 years ago

If you're parsing things in hiccup format, then that AutoFlattenSeq should have been converted to a FlattenOnDemandVector (by a call to convert-afs-to-vec). The two are closely related, but AutoFlattenSeq behaves like a sequence (lazy behavior, conj is treated like a cons to the front), whereas FlattenOnDemandVector acts more like a vector (and conj would append to the end and keep it a vector).

So if you're parsing in hiccup mode but getting back an AutoFlattenSeq, I think we need to do some troubleshooting and figure out why it is not becoming a FlattenOnDemandVector. If you're parsing in enlive mode, then it's probably not reasonable to assume vectors and we should figure out another test for the walking process (e.g., change vector? to sequential?).

But consing an element onto the front of a vector and then producing a new vector with vec will be an O(n) operation every time you add an element. So that would not be a good way to write -conj.

I haven't looked closely at your code, so can't really comment beyond that right now. I know Alex has spent more time reading what you've done, so he may have more concrete suggestions.

mhuebert commented 9 years ago

Yes, I'm parsing in hiccup - so this sounds like the issue.

lbradstreet commented 9 years ago

Hi @mhuebert,

@Engelberg has identified a bug in empty for FlattenOnDemand vector, which caused it do return a AutoFlattenSeq. I have fixed this in [com.lucasbradstreet/instaparse-cljs "1.4.1.0-SNAPSHOT"]. Can you please give this version a try and let us know if it helps?

mhuebert commented 9 years ago

1.4.1.0-SNAPSHOT looks good & works for me. thanks!

On Wed, Jun 24, 2015 at 4:23 AM, Lucas Bradstreet notifications@github.com wrote:

@mhuebert https://github.com/mhuebert, @Engelberg https://github.com/Engelberg has identified a bug in empty for FlattenOnDemand vector, which caused it do return a AutoFlattenSeq. I have fixed this in [com.lucasbradstreet/instaparse-cljs "1.4.1.0-SNAPSHOT"]. Can you please give this version a try and let us know if it helps?

— Reply to this email directly or view it on GitHub https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114703402 .

lbradstreet commented 9 years ago

@mhuebert cool, let us know when the site is up and using it so we can do some testing before making a production release :)

mhuebert commented 9 years ago

Great! it's posted here: http://instaparse-live.matt.is/

On Wed, Jun 24, 2015 at 10:22 AM, Lucas Bradstreet <notifications@github.com

wrote:

@mhuebert https://github.com/mhuebert cool, let us know when the site is up and using it so we can do some testing before making a production release :)

— Reply to this email directly or view it on GitHub https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114780774 .

aengelberg commented 9 years ago

I don't think that the instaparse live site has been updated to the latest instaparse-cljs 1.4.1.0-SNAPSHOT with my fixes from last night.

To test this, enter the following grammar:

Regex = (CharNonRange | Range) + Range = Char <'-'> Char CharNonRange = Char ! ('-' Char) Char = #'[-x]' | 'c' (! 'd') 'x'

And the following input:

x-cx

This should produce ONE parse result, not two, because of a fix I ported from Instaparse 1.4.1. (It is also specifically verified in instaparse-cljs with a unit test.)

On Wed, Jun 24, 2015 at 9:05 AM, Matt Huebert notifications@github.com wrote:

Great! it's posted here: http://instaparse-live.matt.is/

On Wed, Jun 24, 2015 at 10:22 AM, Lucas Bradstreet < notifications@github.com

wrote:

@mhuebert https://github.com/mhuebert cool, let us know when the site is up and using it so we can do some testing before making a production release :)

— Reply to this email directly or view it on GitHub < https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114780774

.

— Reply to this email directly or view it on GitHub https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114926232 .

mhuebert commented 9 years ago

Hmm, is this as expected? http://instaparse-live.matt.is/#/-JsblPpphEcfv6dqAJhj

I just realized that due to a terminal typo I pushed to the wrong branch this afternoon (gh-pagesgit instead of gh-pages), and I may have done the same thing earlier this week too.

On Thu, Jun 25, 2015 at 12:45 AM, aengelberg notifications@github.com wrote:

I don't think that the instaparse live site has been updated to the latest instaparse-cljs 1.4.1.0-SNAPSHOT with my fixes from last night.

To test this, enter the following grammar:

Regex = (CharNonRange | Range) + Range = Char <'-'> Char CharNonRange = Char ! ('-' Char) Char = #'[-x]' | 'c' (! 'd') 'x'

And the following input:

x-cx

This should produce ONE parse result, not two, because of a fix I ported from Instaparse 1.4.1. (It is also specifically verified in instaparse-cljs with a unit test.)

On Wed, Jun 24, 2015 at 9:05 AM, Matt Huebert notifications@github.com wrote:

Great! it's posted here: http://instaparse-live.matt.is/

On Wed, Jun 24, 2015 at 10:22 AM, Lucas Bradstreet < notifications@github.com

wrote:

@mhuebert https://github.com/mhuebert cool, let us know when the site is up and using it so we can do some testing before making a production release :)

— Reply to this email directly or view it on GitHub <

https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114780774

.

— Reply to this email directly or view it on GitHub < https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114926232

.

— Reply to this email directly or view it on GitHub https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-115035298 .

lbradstreet commented 9 years ago

That's the expected result :). By the way, it might be nice if the strings in the parse result are quoted for clarity e.g. '([:Regex [:Range [:Char "x"] [:Char "c" "x"]]])

On 25 Jun 2015, at 6:54 am, Matt Huebert notifications@github.com wrote:

Hmm, is this as expected? http://instaparse-live.matt.is/#/-JsblPpphEcfv6dqAJhj

I just realized that due to a terminal typo I pushed to the wrong branch this afternoon (gh-pagesgit instead of gh-pages), and I may have done the same thing earlier this week too.

On Thu, Jun 25, 2015 at 12:45 AM, aengelberg notifications@github.com wrote:

I don't think that the instaparse live site has been updated to the latest instaparse-cljs 1.4.1.0-SNAPSHOT with my fixes from last night.

To test this, enter the following grammar:

Regex = (CharNonRange | Range) + Range = Char <'-'> Char CharNonRange = Char ! ('-' Char) Char = #'[-x]' | 'c' (! 'd') 'x'

And the following input:

x-cx

This should produce ONE parse result, not two, because of a fix I ported from Instaparse 1.4.1. (It is also specifically verified in instaparse-cljs with a unit test.)

On Wed, Jun 24, 2015 at 9:05 AM, Matt Huebert notifications@github.com wrote:

Great! it's posted here: http://instaparse-live.matt.is/

On Wed, Jun 24, 2015 at 10:22 AM, Lucas Bradstreet < notifications@github.com

wrote:

@mhuebert https://github.com/mhuebert cool, let us know when the site is up and using it so we can do some testing before making a production release :)

— Reply to this email directly or view it on GitHub <

https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114780774

.

— Reply to this email directly or view it on GitHub < https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-114926232

.

— Reply to this email directly or view it on GitHub https://github.com/mhuebert/instaparse-live/issues/8#issuecomment-115035298 .

— Reply to this email directly or view it on GitHub.