inspirehep / inspire

Official repo of the legacy INSPIRE-HEP overlay
http://projecthepinspire.net
17 stars 20 forks source link

Conferences: remove non-numerical values from 411__n #191

Open jacquerie opened 8 years ago

jacquerie commented 8 years ago

The records returned by this query: https://inspirehep.net/search?ln=en&cc=Conferences&ln=en&cc=Conferences&p=411__n%3A%2F%5B%5E0-9%5D%2F&action_search=Search&sf=conferencestartdate&so=d&rm=&rg=25&sc=0&of=hb have non-numerical values in the 411__n field. Those values should be cleaned, to avoid over-complicating the corresponding DoJSON rules.

This is probably better handled by automated curation (@kaplun).

kaplun commented 8 years ago

https://inspirehep.net/search?ln=en&cc=Conferences&ln=en&cc=Conferences&p=411__n%3A%2F%5B%5E0-9%5D%2F&action_search=Search&sf=conferencestartdate&so=d&rm=&sc=0&of=t&ot=411__n&rg=1000

Some of them contain the st,nd,rd,th suffixes, most of them contain the dummy value x. some of them contain garbage.

annetteholtkamp commented 8 years ago

fixed st,nd,rd,th. 169 records left with 411__n:x. The x signifies that this conference is part of a series which hasn't been properly treated yet. Maybe some x's can be automatically replaced by numbers from the title, e.g. if a number has th as suffix: 447th Wilhelm and Else Heraeus Seminar: Charmed Exotics

41 records have the correct number in a separate field, e.g. 000977640 411 $$aMG$$nx 000977640 411 $$n11 https://inspirehep.net/search?ot=411&cc=Conferences&ln=en&cc=Conferences&p=411__n%3Ax+411__n%3A%2F%5Cd%2F&action_search=Search&sf=conferencestartdate&so=d&rm=&rg=25&sc=0&of=hb Sam, could you automatically replace the x by the number?

annetteholtkamp commented 8 years ago

@kaplun , can you take this ticket please?

annetteholtkamp commented 8 years ago

There are 128 records with only an x. Most of them have a number in the title ending on st,nd,rd,th. @kaplun , could you add these automatically as well? The rest we'll check by hand.