inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Set up analyzers #685

Open jmartinm opened 8 years ago

jmartinm commented 8 years ago
jmartinm commented 8 years ago

Tasks extracted from https://github.com/inspirehep/inspire-next/pull/641

kaplun commented 8 years ago

This seems something that could be made easily with a Groovy script. Or altenatively with one more pythonic enhancer.

When possible, :+1: to indeed use index-time expansions + serch-time normalization based on Groovy!

kaplun commented 8 years ago

Expand title, abstract and keywords with synonyms.

@fschwenn the HEP.rdf ontology is still maintained at DESY, correct? Do you think it can be used as a basis for building a synonyms table? (i.e. simple string mapping useful for indexing/searching a given word as two words).

fschwenn commented 8 years ago

Yes, the ontology is still maintained at DESY. It contains informations on synonyms in the sense that if altLabel is found, prefLabel is assigned. But one can not use it without proof reading, e.g.

string: Green-Schwarz Green-Schwarz action Green-Schwarz superstring

If "Green-Schwarz action" is found in the text, we assign "string: Green-Schwarz". However, they are not really synonyms.

Florian Schwennsen Deutsches Elektronen-Synchrotron DESY Building 01 Room O1.446 phone: +49-40-8998-6190

From: "Samuele Kaplun" notifications@github.com To: "inspirehep/inspire-next" inspire-next@noreply.github.com Cc: "Florian Schwennsen" florian.schwennsen@desy.de Sent: Thursday, 7 January, 2016 11:07:37 Subject: Re: [inspire-next] Set up analyzers (#685)

Expand title, abstract and keywords with synonyms. @fschwenn the HEP.rdf ontology is still maintained at DESY, correct? Do you think it can be used as a basis for building a synonyms table? (i.e. simple string mapping useful for indexing/searching a given word as two words). — Reply to this email directly or view it on GitHub .

kaplun commented 8 years ago

Yes, the ontology is still maintained at DESY. It contains informations on synonyms in the sense that if altLabel is found, prefLabel is assigned. But one can not use it without proof reading.

I see... @inspirehep/inspire-dir do you have maybe a good idea were we could obtain a HEP synonym list? E.g. to be able to search for β and beta, for "HEP" and for "High Energy Physics", and of course much more sensible substitutions...

jmartinm commented 8 years ago

Not sure if this can be helpful :) http://www.personal.kent.edu/~plucasst/HEPThesaurus/Alphabetical%20Thesaurus.htm

ksachs commented 8 years ago

I think SPIRES had a list at least for the most common abreviations. E.g. title-variant supersymmetry got automatically assigned to articles with title SUSY.

kaplun commented 8 years ago

@jacquerie wanted to attack mappings at some point so I am assigning to him so that we are sure not to forget the points listed here.

kaplun commented 8 years ago

WRT LaTeX see also #1165

jmartinm commented 6 years ago

@StellaCh this should be reviewed by the person implementing the search since it seems still relevant.

chris-asl commented 6 years ago

@jmartinm regarding DOI and arXiv eprints, what could the superflous prefix be? Something like arxiv:ARXIV_ID? And for DOIs?

jmartinm commented 6 years ago

@jmartinm regarding DOI and arXiv eprints, what could the superflous prefix be? Something like arxiv:ARXIV_ID? And for DOIs?

Ideally you can check the search logs for examples. For arxiv the prefix that some people might use is arxiv: and for DOI it could be doi: or http://dx.doi.org/ @annetteholtkamp or @michamos can confirm if they know other prefixes that should work when searching

michamos commented 6 years ago

@chris-asl for arXiv we have https://github.com/inspirehep/inspire-schemas/blob/ac809becb858cfe2a1c695c18f00014109138dca/inspire_schemas/builders/references.py#L110-L126. You can move it to inspire_schemas.utils if it's needed for search too. For dois, you could use directly idutils.normalize_doi.