inveniosoftware / invenio-query-parser

Search query parser supporting Invenio and SPIRES search syntax.
https://invenio-query-parser.readthedocs.io
GNU General Public License v2.0
8 stars 18 forks source link

spires: journal searching #21

Open jalavik opened 9 years ago

jalavik commented 9 years ago

Originally by hoc on 2011-08-09

find j Phys.Rev.,D41,2330 [works] http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2CD41%2C2330

find j Phys.Rev., D41,2330 [does not work] http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2C+D41%2C2330

This whitespace rule is far too strict. Whitespace following punctuation should be ignored ([\,.:])\s+ -> $1

As a follow-on, if we display publications in the following form: Phys.Rev. D41 (1990) 2330 why can't people search on them in this form? It seems like an obvious thing they'd try, without having to learn another form for searching.

Panos512 commented 9 years ago

Looks like it's a search problem since invenio-query-parser results seem correct:

x.parse_query("find j Phys.Rev.,D41,2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev.,D41,2330')) x.parse_query("find j Phys.Rev.,D41, 2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev.,D41, 2330')) x.parse_query("find j Phys.Rev., D41, 2330").accept(walker()) KeywordOp(Keyword('journal'), Value('Phys.Rev., D41, 2330'))

Maybe we could edit the value of journals before submitting them to elasticsearch to remove all whitespaces.

jirikuncar commented 9 years ago

I would recommend you to add these examples as test cases if they are not already there.

kaplun commented 9 years ago

@Panos512 indeed in this context I believe we should strip whitespaces before sending them to elasticsearch.

Actually this is a generic problem: spacing should be correctly normalized: @tiborsimko, @jirikuncar WDYT? Should this happen at invenio-query-parser level, or elasticsearch is able to strip away inner spaces?

kaplun commented 9 years ago

Answering myself: Elasticsearch supports the tokenfilter, so we could delegate this to each configuration of elasticsearch.