Open jalavik opened 9 years ago
Looks like it's a search problem since invenio-query-parser results seem correct:
x.parse_query("find j Phys.Rev.,D41,2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev.,D41,2330')) x.parse_query("find j Phys.Rev.,D41, 2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev.,D41, 2330')) x.parse_query("find j Phys.Rev., D41, 2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev., D41, 2330'))
Maybe we could edit the value of journals before submitting them to elasticsearch to remove all whitespaces.
I would recommend you to add these examples as test cases if they are not already there.
@Panos512 indeed in this context I believe we should strip whitespaces before sending them to elasticsearch.
Actually this is a generic problem: spacing should be correctly normalized: @tiborsimko, @jirikuncar WDYT? Should this happen at invenio-query-parser level, or elasticsearch is able to strip away inner spaces?
Answering myself: Elasticsearch supports the tokenfilter, so we could delegate this to each configuration of elasticsearch.
Originally by hoc on 2011-08-09
find j Phys.Rev.,D41,2330 [works] http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2CD41%2C2330
find j Phys.Rev., D41,2330 [does not work] http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2C+D41%2C2330
This whitespace rule is far too strict. Whitespace following punctuation should be ignored ([\,.:])\s+ -> $1
As a follow-on, if we display publications in the following form: Phys.Rev. D41 (1990) 2330 why can't people search on them in this form? It seems like an obvious thing they'd try, without having to learn another form for searching.