inspirehep / hepcrawl

Scrapy project for feeds into INSPIRE-HEP
http://inspirehep.net
Other
17 stars 30 forks source link

post-enhancement: automatically set 'citeable=True' #173

Open fschwenn opened 7 years ago

fschwenn commented 7 years ago

Create a workflow task to (in case) add 'citeable=True' which could be added to POSTENHANCE_RECORD for HEP records.

Expected Behavior

That task would check whether record has DOI, report number or pubnote and then sets citeable=True.

Context

Instead of checking it in each spider, user suggestion or BibEdit, one central place should be used for that task.

annetteholtkamp commented 7 years ago

Apart from a few exceptions our current definition of citeable is based on arXiv nr or pub note. We would need to decide whether we want in the future to consider all papers with DOI or report number as citeable.

On 1 Sep 2017, at 11:26, Florian Schwennsen notifications@github.com wrote:

Create a workflow task to (in case) add 'citeable=True' which could be added to POSTENHANCE_RECORD for HEP records.

Expected Behavior

That task would check whether record has DOI, report number or pubnote and then sets citeable=True.

Context

Instead of checking it in each spider, user suggestion or BibEdit, one central place should be used for that task.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/inspirehep/hepcrawl/issues/173, or mute the thread https://github.com/notifications/unsubscribe-auth/AM1-OyNx2McG0eJHn39g1UH92G5c721mks5sd83IgaJpZM4PJ6fQ.

fschwenn commented 7 years ago

see https://labs.inspirehep.net/internal-help/knowledge-base/hep-publishedeprint-curation/

annetteholtkamp commented 7 years ago

Interesting. I don’t think we ever agreed on that. Or is that just my bad memory? As far s report numbers are concerned I remember that they were not considered as standardised enough to reliably catch citations. We should bring this up again on standup.

On 1 Sep 2017, at 11:50, Florian Schwennsen notifications@github.com wrote:

see https://labs.inspirehep.net/internal-help/knowledge-base/hep-publishedeprint-curation/ https://labs.inspirehep.net/internal-help/knowledge-base/hep-publishedeprint-curation/ — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/inspirehep/hepcrawl/issues/173#issuecomment-326539642, or mute the thread https://github.com/notifications/unsubscribe-auth/AM1-Ozvdp7Q5sNiGJWOUknE24fdTuMQ3ks5sd9NMgaJpZM4PJ6fQ.

michamos commented 7 years ago

this is currently already implemented partially in the literature builder in https://github.com/inspirehep/inspire-schemas/blob/36bb1791b4df5890e5445f850c59ed9c5ee9b7c9/inspire_schemas/builders/literature.py#L171 and https://github.com/inspirehep/inspire-schemas/blob/36bb1791b4df5890e5445f850c59ed9c5ee9b7c9/inspire_schemas/builders/literature.py#L415-L416 for arXiv and publication info respectively (I didn't know about that page either). So if at record creation time there is enough information to make a paper citeable, it is automatically flagged as citeable. This includes user suggestions and hepcrawl harvests (both new and updates), but excludes the case where a curator modification in the record editor would make a record (non-)citeable.

fschwenn commented 7 years ago

Neither do I remember that we agreed on RNs and DOIs - but I missed quite some standups, so I checked the training pages thinking they would reflect the agreed status.