olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.94k stars 548 forks source link

TokenSet.toArray() doesn't handle wildcards #393

Closed phtyson closed 5 years ago

phtyson commented 5 years ago

Calling TokenSet.toArray() when there is a wildcard introduced by , for example, TokenSet.fromString("abc*e"), will cause infinite loop due to self-referential edge. I don't know if this would ever arise in normal use. I found it while investigating a lunr-languages problem. I worked around it with this change:

diff --git a/lib/token_set.js b/lib/token_set.js
index ec6fd22..da3f098 100644
--- a/lib/token_set.js
+++ b/lib/token_set.js
@@ -302,10 +302,18 @@ lunr.TokenSet.prototype.toArray = function () {

     for (var i = 0; i < len; i++) {
       var edge = edges[i]
-
+      var next = frame.node.edges[edge];
+      if (frame.node === next) {
+        if (edges[i+1]) {
+          next = frame.node.edges[edges[i+1]];
+        } else {
+          next = new lunr.TokenSet;
+          next.final = frame.node.final;
+        }
+      }
       stack.push({
         prefix: frame.prefix.concat(edge),
-        node: frame.node.edges[edge]
+        node: next
       })
     }
   }
olivernn commented 5 years ago

The infinite loop is certainly not intentional, that said, what should the output be for your example?

Practically speaking the TokenSet#toArray method was not intended to be used on token sets that contain an wildcard.

The simplest thing is to just document that the behaviour of this method with a token set containing wildcards is undefined. We could include a check that throws an error, but I don't think its worth it.

What do you think?

phtyson commented 5 years ago

Maybe just a warning in documentation. I don't think this will ever happen in normal operation. I just happened on it when I added some debugging steps to track down unexpected behavior in a lunr-languages plugin I was developing. My suggested patch actually isn't correct--it will drop doubled letters from the array. It should check for edge==="*". Thanks for the reply.

olivernn commented 5 years ago

I've updated the documentation with a quick note about the undefined behaviour, this will show up in the docs site at the next release, but is in code should anyone else run into similar issues.

Thanks for reporting.