Open metasj opened 5 years ago
Queries that generate errors: a.b \hello \hello \there hello#%
Not sure what the right way to go here is - maybe sanitizing query strings on the client before passing them to the queryParser
, or better error messaging, or both.
There's no way to search for special characters, ElasticSearch doesn't even index them.
An example screenshot:
The error that appears in console is JSON.parse: unexpected end of data at line 1 column 1 of the JSON data
Sanitizing is fine for a quick patch. (Say replacing special chars in a word w/ a space, and quoting the result) But:
I get the strangest set of results testing SJ's list above against www and dev-v2. (Click the ✅ or ❌ icons to run the search yourself.)
www | dev-v2 | |
---|---|---|
a.b |
❌ 404 | ✅ |
\hello |
❌ 404 | ❌ 503 |
\hello \there |
❌ 404 | ❌ 503 |
hello#% |
❌ 404 | ❌ 503 |
"a.b" |
✅ | ❌ 503 |
"\hello" |
✅ | ❌ 503 |
"\hello \there" |
✅ | ❌ 503 |
"hello#%" |
✅ | ❌ 503 |
✅ indicates the search completed without error, btw. It doesn't mean there were actual results.
Here, by the way, are the list of searches that Cisco uses to test the site, and their current results. (Click the ✅ or ❌ icons to run the search yourself.)
Search Term | Operator | www | dev-v2 |
---|---|---|---|
in-memory |
✅ | TBD | |
"Cisco Systems" |
✅ | ❌ | |
2003 |
❌ Date range filter does not work as expected. | TBD | |
cisco AND router |
✅ | TBD | |
cisco router |
AND |
✅ | TBD |
cisco\|router |
✅ | TBD | |
cisco OR router |
✅ | TBD | |
cisco router |
OR |
✅ | TBD |
cisco NEAR router |
✅ | TBD | |
cisco router |
NEAR |
✅ | TBD |
cisco ADJ router |
✅ | TBD | |
cisco router |
ADJ |
✅ | TBD |
cisco WITH router |
✅ | TBD | |
cisco router |
WITH |
✅ | TBD |
cisco SAME router |
✅ | TBD | |
cisco router |
SAME |
✅ | TBD |
(test$3 OR monitor$3 OR measur$3 OR measurement OR probe OR probed OR probing) NEAR5 (pathway OR path OR route OR routing) NEAR5 (communication or network) |
✅ | TBD | |
((test$3 OR monitor$3 OR measur$3 OR measurement OR probe OR probed OR probing) NEAR5 (pathway OR path OR route OR routing) NEAR5 (communication OR disconnect)) AND (IP ADJ Network) |
✅%20AND%20(IP%20ADJ%20Network)) | TBD%20AND%20(IP%20ADJ%20Network)) | |
(Processor or processing) ADJ2 circuit AND @py<"2010" |
❌ Pulls documents where processor/processing and circuit are within two words of each other, but does not filter by date appropriately. | TBD | |
config$3 |
✅ | TBD | |
cesco~1 |
✅ | TBD | |
cisco^100 |
❌ Pulls documents where 'cisco' does not appear in the title, and the boost operator does not work as expected. | TBD | |
cisco.ti. |
✅ | TBD | |
(router AND cisco AND public).ti. |
✅ | TBD | |
cisco.ab. |
TBD | TBD | |
cisco.ab. Router.ab. |
AND |
TBD | TBD |
(testing and outline).ti,ab. |
TBD | TBD | |
cisco AND 799/11.ccls |
❌ Search failed with an error from the server. | TBD | |
cisco AND marker.ASGP |
❌ Search failed with an error from the server. | TBD | |
(Chet NEAR2 Ramey NEAR2 Case).IN. |
❌ Search failed with an error from the server. | TBD | |
19961011.AD |
❌ Search failed with an error from the server. | TBD | |
testing and @pd<"20100107" |
❌ Pulls documents with testing in the content, but publication date is not working, so we cannot search by date. | TBD | |
G06F13/4081.cpc. |
❌ Pulls up a blank page | TBD |
Was able to resolve some issues with our Lambda response handling to actually surface the error being thrown by queryParser()
when we send a query like foo NEAR bar
:
{
"ExecutedVersion": "$LATEST",
"FunctionError": "Unhandled",
"Payload": {
"errorMessage": "java.lang.IllegalStateException",
"errorType": "java.lang.IllegalStateException",
"stackTrace": [
"com.uspto.query.parser.BoolParsingRules$BoolNode.getValue(BoolParsingRules.java:959)",
"com.uspto.query.parser.BoolParseMain.parseQuery(BoolParseMain.java:83)",
"com.uspto.query.parser.QueryParser.pasrseToElastic(QueryParser.java:22)",
"edu.mit.kfg.priorart.parser.ParserHandler.handleRequest(ParserHandler.java:10)",
"sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)",
"sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
"java.lang.reflect.Method.invoke(Method.java:498)"
]
},
"StatusCode": 200
}
Some simple queries, especially involving special characters, are generating errors since the latest update. It shouldn't be possible to produce a non-descript error by entering any string into the search box.