prior-art-archive / priorartarchive.org

Prior Art Archive Site
https://priorartarchive.org
GNU General Public License v2.0
3 stars 1 forks source link

Search bug: common strings generate errors. #16

Open metasj opened 5 years ago

metasj commented 5 years ago

Some simple queries, especially involving special characters, are generating errors since the latest update. It shouldn't be possible to produce a non-descript error by entering any string into the search box.

metasj commented 5 years ago

Queries that generate errors: a.b \hello \hello \there hello#%

joeltg commented 5 years ago

Not sure what the right way to go here is - maybe sanitizing query strings on the client before passing them to the queryParser, or better error messaging, or both.

There's no way to search for special characters, ElasticSearch doesn't even index them.

slifty commented 5 years ago

An example screenshot:

image

The error that appears in console is JSON.parse: unexpected end of data at line 1 column 1 of the JSON data

metasj commented 5 years ago

Sanitizing is fine for a quick patch. (Say replacing special chars in a word w/ a space, and quoting the result) But:

reefdog commented 5 years ago

I get the strangest set of results testing SJ's list above against www and dev-v2. (Click the ✅ or ❌ icons to run the search yourself.)

www dev-v2
a.b 404
\hello 404 503
\hello \there 404 503
hello#% 404 503
"a.b" 503
"\hello" 503
"\hello \there" 503
"hello#%" 503

✅ indicates the search completed without error, btw. It doesn't mean there were actual results.

reefdog commented 5 years ago

Here, by the way, are the list of searches that Cisco uses to test the site, and their current results. (Click the ✅ or ❌ icons to run the search yourself.)

Search Term Operator www dev-v2
in-memory TBD
"Cisco Systems"
2003 Date range filter does not work as expected. TBD
cisco AND router TBD
cisco router AND TBD
cisco\|router TBD
cisco OR router TBD
cisco router OR TBD
cisco NEAR router TBD
cisco router NEAR TBD
cisco ADJ router TBD
cisco router ADJ TBD
cisco WITH router TBD
cisco router WITH TBD
cisco SAME router TBD
cisco router SAME TBD
(test$3 OR monitor$3 OR measur$3 OR measurement OR probe OR probed OR probing) NEAR5 (pathway OR path OR route OR routing) NEAR5 (communication or network) TBD
((test$3 OR monitor$3 OR measur$3 OR measurement OR probe OR probed OR probing) NEAR5 (pathway OR path OR route OR routing) NEAR5 (communication OR disconnect)) AND (IP ADJ Network) %20AND%20(IP%20ADJ%20Network)) TBD%20AND%20(IP%20ADJ%20Network))
(Processor or processing) ADJ2 circuit AND @py<"2010" Pulls documents where processor/processing and circuit are within two words of each other, but does not filter by date appropriately. TBD
config$3 TBD
cesco~1 TBD
cisco^100 Pulls documents where 'cisco' does not appear in the title, and the boost operator does not work as expected. TBD
cisco.ti. TBD
(router AND cisco AND public).ti. TBD
cisco.ab. TBD TBD
cisco.ab. Router.ab. AND TBD TBD
(testing and outline).ti,ab. TBD TBD
cisco AND 799/11.ccls Search failed with an error from the server. TBD
cisco AND marker.ASGP Search failed with an error from the server. TBD
(Chet NEAR2 Ramey NEAR2 Case).IN. Search failed with an error from the server. TBD
19961011.AD Search failed with an error from the server. TBD
testing and @pd<"20100107" Pulls documents with testing in the content, but publication date is not working, so we cannot search by date. TBD
G06F13/4081.cpc. Pulls up a blank page TBD
reefdog commented 5 years ago

Was able to resolve some issues with our Lambda response handling to actually surface the error being thrown by queryParser() when we send a query like foo NEAR bar:

{
  "ExecutedVersion": "$LATEST",
  "FunctionError": "Unhandled",
  "Payload": {
    "errorMessage": "java.lang.IllegalStateException",
    "errorType": "java.lang.IllegalStateException",
    "stackTrace": [
     "com.uspto.query.parser.BoolParsingRules$BoolNode.getValue(BoolParsingRules.java:959)",
     "com.uspto.query.parser.BoolParseMain.parseQuery(BoolParseMain.java:83)",
     "com.uspto.query.parser.QueryParser.pasrseToElastic(QueryParser.java:22)",
     "edu.mit.kfg.priorart.parser.ParserHandler.handleRequest(ParserHandler.java:10)",
     "sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)",
     "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
     "java.lang.reflect.Method.invoke(Method.java:498)"
    ]
  },
  "StatusCode": 200
}