inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

CitationAnalysis: citeable not found #3448

Open ksachs opened 6 years ago

ksachs commented 6 years ago

I don't see why labs doesn't find these citations

recid   legacy labs
231707  9113    0   {'reference': {'publication_info': {'journal_title': 'Z.Phys.', 'artid': '714', 'year': '1936', 'page_start': '714', 'journal_volume': '98'}, 'label': '101', 'authors': [{'full_name': 'Heisenberg, W.'}, {'full_name': 'Euler, H.'}]}, 'record': {'$ref': 'http://localhost:5000/api/literature/9113'}, 'recid': 9113, 'curated_relation': False}
403689  54961   0   {'reference': {'publication_info': {'journal_title': 'Theor.Math.Phys.', 'artid': '1', 'page_start': '1', 'journal_volume': '1'}}, 'record': {'$ref': 'http://localhost:5000/api/literature/54961'}, 'recid': 54961, 'curated_relation': False}
1186037 652597  0   {'reference': {'publication_info': {'journal_title': 'Eur.Phys.J.C', 'artid': '1', 'year': '2005', 'page_start': '1', 'journal_volume': '41'}, 'label': '10', 'authors': [{'full_name': 'Charles, J.'}], 'urls': [{'value': 'http://ckmfitter.in2p3.fr'}], 'misc': ['CKMfitter Group HEP-PH/0406184], updated results and plots available at']}, 'record': {'$ref': 'http://localhost:5000/api/literature/652597'}, 'recid': 652597, 'curated_relation': False}
1511941 47300   0   {'reference': {'publication_info': {'journal_title': 'Phys.Rev.', 'artid': '1766', 'year': '1949', 'page_start': '1766', 'journal_volume': '75'}, 'label': '78', 'authors': [{'full_name': 'Haxel, O.'}, {'full_name': 'Jensen, J.H. D.'}], 'misc': ['e H. E. Suess']}, 'record': {'$ref': 'http://localhost:5000/api/literature/47300'}, 'recid': 47300, 'curated_relation': False}
634193  61163   0   {'reference': {'publication_info': {'journal_title': 'Nuovo Cim.A', 'artid': '457', 'page_start': '457', 'journal_volume': '69'}}, 'record': {'$ref': 'http://localhost:5000/api/literature/61163'}, 'recid': 61163, 'curated_relation': False}
879364  642815  0   {'reference': {'publication_info': {'journal_title': 'Nucl.Instrum.Meth.A', 'artid': '1', 'page_start': '1', 'journal_volume': '530'}}, 'record': {'$ref': 'http://localhost:5000/api/literature/642815'}, 'recid': 642815, 'curated_relation': False}
793389  716060  0   {'reference': {'publication_info': {'journal_title': 'Nucl.Instrum.Meth.A', 'artid': '1', 'page_start': '1', 'journal_volume': '560'}}, 'record': {'$ref': 'http://localhost:5000/api/literature/716060'}, 'recid': 716060, 'curated_relation': False}
924622  83793   0   {'reference': {'publication_info': {'journal_title': 'Sov.Phys.Usp.', 'artid': '777', 'page_start': '777', 'journal_volume': '16'}}, 'record': {'$ref': 'http://localhost:5000/api/literature/83793'}, 'recid': 83793, 'curated_relation': False}

1382176 810300  0   {'reference': {'collaborations': ['ATLAS Collaboration'], 'arxiv_eprint': '0901.0512', 'authors': [{'full_name': 'Aad, G.'}], 'report_numbers': ['CERN-OPEN-2008-020'], 'label': '27', 'misc': ['and']}, 'record': {'$ref': 'http://localhost:5000/api/literature/810300'}, 'recid': 810300, 'curated_relation': False}
1321755 1241571 0   {'reference': {'arxiv_eprint': '1307.1347', 'authors': [{'full_name': 'Heinemeyer, S.'}, {'full_name': 'Mariotti, C.'}], 'publication_info': {'year': '2013'}, 'report_numbers': ['CERN-2013-004'], 'label': '84', 'misc': ['and G. Passarino, and R. Tanaka (eds.) (LHC Higgs Cross Section Working Group), Handbook of LHC Higgs Cross Sections: 3. Higgs Properties (CERN, Geneva,)']}, 'record': {'$ref': 'http://localhost:5000/api/literature/1241571'}, 'recid': 1241571, 'curated_relation': False}
jacquerie commented 6 years ago

I recognize the first one in the list, as we investigated it a bit with @salmanmaq and @michamos. It's due to the fact the matcher is not ignoring deleted records, so it detects an ambiguous match between https://labs.inspirehep.net/literature/9113 and https://labs.inspirehep.net/literature/431037, so it decides to assign the citation to none of them.

ksachs commented 6 years ago

That cant be the only problem. E.g.

In [3]: search_pattern(p='773__p:"Eur.Phys.J." 773__v:"C75" 773__c:"1"')
intbitset([1300380])

In [7]: search_pattern(p="037:'1307.1347'")
intbitset([1241571])
In [8]: search_pattern(p='037:"CERN-2013-004"')
intbitset([1241571])

have only one record (as far as I see).

jacquerie commented 6 years ago

Mh. My best guess is that the cited record was not migrating successfully at the time of the experiment, so the matcher was not able to find it.

ksachs commented 6 years ago

can we make sure these citations are resolvable if everything goes OK? Just to make sure there are no hidden bugs. And please fix the problem with the deleted records.

salmanmaq commented 6 years ago

To be honest, I can't really tell the reason for not matching here. The cited records are present in the localhost on which I ran the experiment. :neutral_face:

I'll look in more detail though but nothing concrete so far.

jacquerie commented 6 years ago

Part of this issue (specifically the problem I mentioned in https://github.com/inspirehep/inspire-next/issues/3448#issuecomment-394729685) is addressed by https://github.com/inspirehep/inspire-next/pull/3462.