inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

CitationAnalysis: confusion between 2 records #3320

Open ksachs opened 6 years ago

ksachs commented 6 years ago

For some of these I have no idea why labs finds a wrong record. Also listed in https://app.asana.com/0/3003451971699/620773521680386

Each example is given as

recid:[recid] (type of record)
summary of basic metadata
Citations difference and counts on legacy and  labs

for both records followed by a typical reference  

-

recid:650205    (ConferencePaper, arXiv, Citeable, Published)
The Swift Gamma-Ray Burst Mission
Swift Science
Gehrels, N. et.al
10.1086/422091, 10.1086/427409
astro-ph/0405233
Astrophys.J. 611 1005-1020 (2004)
Astrophys.J. 621 558 (2005)  (Erratum)
Citations difference:863  legacy:990  labs:1847

recid:1390184    (ConferencePaper, Citeable)
The Swift Gamma-Ray Burst Mission
Swift Team
Gehrels, N.
10.1063/1.1810924
C03-09-08.1 637-641 (2004)
Citations difference:855  legacy:858  labs:7

999C5 $$01390184$$sAIP Conf.Proc.727,637  

-

recid:642540    (Published, arXiv, Citeable)
Production of excited neutrino at LHC
Belyaev, A. et.al
10.1140/epjcd/s2005-02-006-0
hep-ph/0401066, FSU-HEP-040110
Eur.Phys.J. C41S2 1-10 (2005)
Citations difference:255  legacy:7  labs:262

recid:652597    (Published, arXiv, Citeable, Review)
CP violation and the CKM matrix: Assessing the impact of the asymmetric $B$ factories
CKMfitter Group
Charles, J. et.al
10.1140/epjc/s2005-02169-1
hep-ph/0406184, CPT-2004-P-030, LAL-04-21, LAPP-EXP-2004-01, LPNHE-2004-01
Eur.Phys.J. C41 1-131 (2005)
Citations difference:256  legacy:1603  labs:1349

999C5 $$0652597$$o9$$sEur.Phys.J.C41,1$$y2005$$hCharles, J.

-

recid:1495903    (ConferencePaper, Citeable, arXiv, Fermilab)
A Combined $\nu_\mu \rightarrow \nu_e$ and $\bar \nu_\mu \rightarrow \bar \nu_e$ Oscillation Analysis of the MiniBooNE Excesses
MiniBooNE
Aguilar-Arevalo, A.A. et.al
FERMILAB-PUB-12-394-AD-PPD, arXiv:1207.4809, LA-UR-12-23041
Citations difference:191  legacy:69  labs:260

recid:1223326    (Published, arXiv, Citeable, Fermilab)
Improved Search for $\bar \nu_\mu \rightarrow \bar \nu_e$ Oscillations in the MiniBooNE Experiment
MiniBooNE
Aguilar-Arevalo, A.A. et.al
10.1103/PhysRevLett.110.161801
LA-UR-13-21523, FERMILAB-PUB-13-066-AD-E-PPD, arXiv:1303.2588
2013-04-15
Phys.Rev.Lett. 110 161801 (2013)
Citations difference:196  legacy:529  labs:343

999C5 $$01223326$$o90$$sPhys.Rev.Lett.110,161801$$y2013$$hAguilar-Arevalo, A.A.$$m[MiniBooNE Collaboration]

-

recid:495427    (ConferencePaper, Published, arXiv, Citeable)
Galaxy cluster abundance evolution and cosmological parameters
Viana, Pedro T.P. et.al
astro-ph/9902245
C98-12-09.1
Citations difference:181  legacy:5  labs:176
??????????????????

recid:488774    (Arxiv, Citeable, Published)
Galaxy clusters at 0.3 < Z < 0.4 and the value of omega_0
Viana, Pedro T P et.al
10.1046/j.1365-8711.1999.02229.x
astro-ph/9803244, SUSSEX-AST-98-3-3
Mon.Not.Roy.Astron.Soc. 303 535-545 (1999)
Citations difference:181  legacy:209  labs:38

999C5 $$0488774$$sMon.Not.Roy.Astron.Soc.303,535

-

recid:592753    (Published, arXiv, Citeable)
Complex extension of quantum mechanics
Bender, Carl M. et.al
10.1103/PhysRevLett.89.270401, 10.1103/PhysRevLett.92.119902
quant-ph/0208076
Phys.Rev.Lett. 89 270401 (2002)
Phys.Rev.Lett. 92 119902 (2004)  (Erratum)
Citations difference:171  legacy:204  labs:367

recid:1359496    (ConferencePaper, Citeable)
Complex extension of quantum mechanics
Bender, Carl M.
C03-06-23.4 617-628 (2003)
Citations difference:166  legacy:169  labs:3

999C5 $$01359496$$seConf C0306234,617

-

recid:728159    (Published, arXiv, Citeable)
The UKIRT Infrared Deep Sky Survey First Data Release
Warren, S.J. et.al
10.1111/j.1365-2966.2006.11284.x
astro-ph/0610191
2006-02-11
Mon.Not.Roy.Astron.Soc. 375 213-226 (2007)
Citations difference:160  legacy:37  labs:175
???????????????????

recid:712900    (arXiv, Citeable, Published)
The UKIRT Infrared Deep Sky Survey Early Data Release
Dye, S. et.al
10.1111/j.1365-2966.2006.10928.x
astro-ph/0603608
Mon.Not.Roy.Astron.Soc. 372 1227-1252 (2006)
Citations difference:160  legacy:179  labs:41

999C5 $$0712900$$sMon.Not.Roy.Astron.Soc.372,1227   -> 00728159

-

recid:37844    (arXiv, Citeable)
Microscopic mass formulae
Duflo, J. et.al
nucl-th/9404019, LQ-5040
Citations difference:102  legacy:10  labs:96
??????????????

recid:394902    (arXiv, Citeable, Published)
Microscopic mass formulae
Duflo, J. et.al
10.1103/PhysRevC.52.R23
nucl-th/9505011
1995-07-01
Phys.Rev. C52 R23 (1995)
Phys.Rev. C52 23 (1995)
Citations difference:106  legacy:267  labs:185

999C5 $$0394902$$sPhys.Rev.C52,23

-

recid:578661    (Published, Arxiv, Citeable)
Observational mass-to-light ratio of galaxy systems: from poor groups to rich clusters
Girardi, M. et.al
10.1086/339360
astro-ph/0112534
Astrophys.J. 569 720-741 (2002)
Citations difference:94  legacy:71  labs:141
??????????

recid:576703    (arXiv, Citeable, Published)
The mass-to-light function of virialized systems and the relationship between their optical and x-ray properties
Marinoni, Christian et.al
10.1086/339319
astro-ph/0109134
Astrophys.J. 569 101-111 (2002)
Citations difference:94  legacy:109  labs:39

    999C5 $$0576703$$sAstrophys.J.569,101   

-

recid:526861    (Lectures, Review, arXiv, ConferencePaper, Citeable)
Cosmological constant versus quintessence
Binetruy, P.
10.1007/3-540-45334-2_8
LPT-ORSAY-00-47, hep-ph/0005037, LPT-ORSAY-0047
2000
C99-06-28.8 397-422 (2000)
Citations difference:76  legacy:8  labs:78
????????

recid:1362558    (Published, Citeable, Review)
Cosmological constant versus quintessence
Binetruy, Pierre
10.1023/A:1003697832568
2000
Int.J.Theor.Phys. 39 1859-1875 (2000)
Citations difference:76  legacy:79  labs:9
    999C5 $$01362558$$sInt.J.Theor.Phys.39,1859 

-

recid:577237    (Published, Arxiv, Citeable)
High resolution calculations of merging neutron stars I: model description and hydrodynamic evolution
Rosswog, S. et.al
10.1046/j.1365-8711.2002.05409.x
astro-ph/0110180
Mon.Not.Roy.Astron.Soc. 334 481-497 (2002)
Citations difference:102  legacy:87  labs:151
??????????

recid:621602    (arXiv, Citeable, Published)
High Resolution Calculations of Merging Neutron Stars. 3. Gamma-Ray Bursts
Rosswog, Stephan et.al
10.1046/j.1365-2966.2003.07032.x
astro-ph/0306418
Mon.Not.Roy.Astron.Soc. 345 1077 (2003)
Citations difference:104  legacy:145  labs:81
     999C5 $$0621602$$sMon.Not.Roy.Astron.Soc.345,1077 

-

recid:658548    (Published, arXiv, Citeable)
Modeling the pion and kaon form factors in the timelike region
Bruch, Christine et.al
10.1140/epjc/s2004-02064-3
TTP-04-20, SI-HEP-2004-09, hep-ph/0409080, TTP04-20
Eur.Phys.J. C39 41-54 (2005)
Citations difference:60  legacy:83  labs:141

recid:682442    (Published, Citeable)
Summary of the CMS potential for the Higgs boson discovery
Abdullin, S. et.al
10.1140/epjcd/s2004-02-003-9
Eur.Phys.J. C39S2 41-61 (2005)
Citations difference:61  legacy:64  labs:7
    + 00684319 999C5 $$0682442$$sEur.Phys.J.C39,41   -> 00682442

-

recid:631452    (ConferencePaper, arXiv, Citeable)
The Dark matter distribution in the central regions of galaxy clusters
Sand, David J. et.al
astro-ph/0310703
C03-06-15.1 67-70 (2005)
Citations difference:137  legacy:27  labs:120
????????????

recid:628449    (arXiv, Citeable, Published)
The dark matter distribution in the central regions of galaxy clusters: Implications for CDM
Sand, David J. et.al
10.1086/382146
astro-ph/0309465
Astrophys.J. 604 88-107 (2004)
Citations difference:137  legacy:171  labs:78

 999C5 $$0628449$$sAstrophys.J.604,88  

-

recid:1591665    (Published, citeable)
On the "Magic Numbers" in Nuclear Structure
Haxel, Otto et.al
10.1103/PhysRev.75.1766.2
1949-06-01
Phys.Rev. 75 1766-1766 (1949)
Citations difference:307  legacy:6  labs:313

recid:47300    (Published, Citeable)
Total Reflection of Neutrons on Cobalt
Hamermesh, Morton
10.1103/PhysRev.75.1766
Phys.Rev. 75 1766-1766 (1949)
Citations difference:89  legacy:86  labs:3
    - 00757001 999C5 $$047300$$o11$$adoi:10.1103/PhysRev.75.1766.2$$sPhys.Rev.75,1766$$y1949$$hO, Haxel; D, Jensen J H;

-

recid:448419    (ConferencePaper, arXiv, Citeable, Lectures)
Notes on matrix and micro strings
Dijkgraaf, Robbert et.al
10.1142/9789814447287_0003, 10.1016/S0920-5632(98)00138-8
hep-th/9709107
C97-06-16.3 28-54 (1998)
C97-05-26.4 319-356 (1999)
C97-02-17.2 105-145
Citations difference:58  legacy:12  labs:70
   I+ 00694902 999C5 $$01620629$$sNucl.Phys.B Proc.Suppl.62,348   -> 01620629

   recid:1620629    (Citeable, ConferencePaper)
Matrix and micro strings
Verlinde, H.
10.1016/S0920-5632(97)00677-4
C97-05-27.2 348-362 (1998)
Citations difference:58  legacy:58  labs:0
    - 00694902 999C5 $$01620629$$sNucl.Phys.B Proc.Suppl.62,348   -> 00448419
?????????

-

recid:630209    (Published, Arxiv, Citeable)
Phase correlations in cosmic microwave background temperature maps
Coles, Peter et.al
10.1111/j.1365-2966.2004.07706.x
astro-ph/0310252
Mon.Not.Roy.Astron.Soc. 350 989-1004 (2004)
Citations difference:56  legacy:40  labs:96
   I+ 00630906 999C5 $$0644038$$sMon.Not.Roy.Astron.Soc.350,983   -> 00644038
   ??????????

recid:644038    (arXiv, Citeable, Published)
Properties of groups of galaxies in the vicinity of massive clusters
Ragone, C.J. et.al
10.1111/j.1365-2966.2004.07705.x
astro-ph/0402155
Mon.Not.Roy.Astron.Soc. 350 983-988 (2004)
Citations difference:56  legacy:62  labs:6
    - 00630906 999C5 $$0644038$$sMon.Not.Roy.Astron.Soc.350,983   -> 00630209
salmanmaq commented 6 years ago
  1. recids: 650205 and 1390184 - The erratum problem is well known and would be catered to in the coming days.

  2. recids: 642540 and 652597 - Interesting case of Labs reference matcher failure, where Legacy works fine. A fix for this has already been implemented.

  3. recids: 1223326 and 1495903 - Not a problem with either labs or legacy. It's just people wrongly citing the records. Both records are very similar, and people would include the publication information for1223326 but include the arxiv for 1495903. Since the Labs reference matcher runs the arxiv query first, it would make the wrong associations. This is neither a matcher or a metadata problem, but just people who cite wrongly in their papers.

  4. recids: 495427 and 488774 - Seems like a metadata problem. For example, this is a reference from record 616883: Viana, P. T. P. & Liddle, A. R. 1999, MNRAS, 303, 535 And if I see the metadata for this reference (below), it wrongly gets the arxiv_eprint "astro-ph/9902245" (which points to 495427), and Labs matches based on that, which is correct and expected behaviour. However the actual reference above (in Italics) has the arxiv: astro-ph/9803244 (which points to 488774), and legacy points to that. So this is a metadata problem.

  5. recids: 592753 and 1359496 - I have checked like 4-5 conflicting records and it seems that Labs gets it right while Legacy doesn't.

  6. recids: 712900 and 728159 - Problem with metadata and most likely an issue with refextract. For example for the record 749213, the reference is: Warren, S. J., et al. 2007, MNRAS, 375, 213 The journal_volume is clearly 375(which points to 728159), but as we can see below, the metadata contains journal_volume 372 (which points to 712900). The arxiv, however, is correct. And thus Labs is able to successfully identify it and match it correctly, while legacy doesn't.

  7. Error in metadata. For example, 1085403 cites 394902 as: J. Duflo and A. P. Zuker, Phys. Rev. C 52, R23 (1995) The publicaiton_info for this reference is correct if we look at the metadata for references in 1085403. However, the same reference has a wrong arxiv id: "nucl-th/9404019" (points to 37844). It should be nucl-th/9505011 (points to 394902). In this case, Labs is doing what is expected of it, since it first tries to match using the arxiv id. However, the arxiv just points to the wrong record.

  8. 578661 and 576703 - Same problem as above. Wrong associated arxiv.

  9. recids: 1362558 and 526861 - Same issue as above. Wrong arxiv in references.

  10. recids: 577237 and 621602: Labs get it correct from the publication info while Legacy doesn't.

  11. recids: 658548 and 682442 - Metadata issues. Both articles have the same start page in the same journal, same volume, same year. That can't be right.

  12. recids: 631452 and 628449 - The wrong arxiv associated with references issue.

  13. recids: 47300 and 1591665: Quite interesting problem there. Both records lie on the same page of the same journal, kind of like an erratum. So their publication info is the same. That's why legacy usually gets it wrong, but Labs can match them correctly via the DOIs. But this is an interesting problem in general. For such records, we will always need some other information like arxiv, doi etc to distinguish them. It would be impossible to distinguish them using just the publication info usinhg our current code, on both Legacy and Labs.

  14. recids: 448419 and 1620629: Same arxiv Problem. I discussed it with Micha and he mentions that some records, especially similarly named ones may have this problem, as this happened during migration from spires to inspire. On our end, we can't do much about it. Not sure if changes in metadata will do anything. But on the other hand, these records are also crazy. There are lots of problems with them, for example, if I check for 448419 on the web, ScienceDirect gives very different metadata of the article than we have (https://www.sciencedirect.com/science/article/pii/S0920563298001315). Volume 67, Issues 1–3, July 1998, Pages 225-250 while we have: "journal_title": "Nucl.Phys.B Proc.Suppl.", "journal_volume": "68", "page_end": "54", "page_start": "28", "parent_recid": 481913, "parent_record": { "$ref": "http://labs.inspirehep.net/api/literature/481913" }, "year": 1998 Note the journal_volume and page_start! Secondly, people have been citing these records like crazy: For example, this guy in 507980 cites it as: R. Dijkgraaf, E. Verlinde and H. Verlinde, Notes on Matrix and Micro Strings, Nucl. Phys. B (Proc. Suppl.) 62 (1998) 348, hep-th/9709107 The paper title, authors, and arxiv correspond to each other, but the publication_info is for the other record! Similarly, here in another reference: R. Dijkgraaf, E. Verlinde and H. Verlinde, Nucl. Phys. B500 (1997) 43 (hepth/9703030); Nucl. Phys. Proc. Suppl. 62 (1988) 348 (hep-th/9709107).* These records are a bit frustrating and I am not sure what we can do about it.

  15. recids: 644038 and 630209 - Labs gets it right, while Legacy doesn't. No apparent reasons. Publication_info, DOI, and arxiv all seem correct. Plausible reason is that 630209 comes immediately after 644038 in the same journal. But again, the metadata seems correct, and I can't really figure out the reason why Legacy is getting it wrong. In any case, Labs is doing it fine.

ksachs commented 6 years ago

Oh shit! Thanks for this analysis.

I had a look at some cases. Looks like remains of wrong merges. Undoing the wrong merge results in references with conflicting information in $$0 - $$r - $$s.

I have no idea how to clean it.

legacy searches journal first, labs eprint. We don't know a priory which is right, but this causes differences in citations. Should both systems do it equally wrong if neither can get it right?

3.) pdf: [20] A. A. Aguilar-Arevalo et.al. (MiniBooNE collaboration), arXiv:1207.4809. 001182207 999C5 $$01223326$$hA. A. Aguilar-Arevalo et al.$$m(MiniBooNE collaboration)$$o20$$rarXiv:1207.4809

$$0 disagrees with $$r 001223326 035 $$9arXiv$$aoai:arXiv.org:1303.2588 001223326 773 $$c161801$$pPhys.Rev.Lett.$$v110$$y2013

001495903 035__ $$9arXiv$$aoai:arXiv.org:1207.4809

================================

4) 000616883 999C5 $$rastro-ph/9902245$$sMon.Not.Roy.Astron.Soc.,303,535

wrong merge undone https://inspirehep.net/record/edit/compare_revisions?recid=495427&rev1=20161007230049&rev2=20151220183445 now 2 records

================================ 6) pdf: Dye, S., Warren, S. J., Hambly, N. C., et al., 2006, MNRAS, 372, 1227 000749213 999C5 $$rastro-ph/0610191$$sMon.Not.Roy.Astron.Soc.,372,1227

https://inspirehep.net/record/edit/compare_revisions?recid=728159&rev1=20170929133621&rev2=20160323223009 had a pubnote for this too, now separate record

000712900 037 $$9arXiv$$aastro-ph/0603608$$castro-ph 000712900 773 $$c1227-1252$$n3$$pMon.Not.Roy.Astron.Soc.$$v372$$y2006

000728159 037 $$9arXiv$$aastro-ph/0610191$$castro-ph 000728159 773 $$c213-226$$n1$$pMon.Not.Roy.Astron.Soc.$$v375$$y2007

712900 is correct, which you see only on the pdf. legacy is correct (by chance).

=============================== 7) legacy metadata: 001085403 999C5 $$rnucl-th/9404019$$sPhys.Rev.,C52,23

wrong merge undone: https://inspirehep.net/record/edit/compare_revisions?recid=37844&rev1=20150729125930&rev2=20150722141650

now 2 records

ksachs commented 6 years ago
  1. recids: 658548 and 682442 - Metadata issues. Both articles have the same start page in the same journal, same volume, same year. That can't be right.

It's correct, they have different pubnote. One is a supplement: Eur.Phys.J. C39S2 (2005) 41-61 Eur.Phys.J. C39 (2005) 41-54

ksachs commented 6 years ago

Proposal: search for spires-style references ($$r, $$s only) with contradicting information. If $$s matches a record, delete (or move to $m) the information in $$r. That way labs and legacy will use the same information.