Closed ksachs closed 4 years ago
sorry - even more complex than before. I guess I found most bugs, but you never know with dirty metadata. The code does what it is supposed to do, if you want to test anyhow ...
It will run once/week, --ticket-creation-policy=per-rule
Just replace the plugin, we keep the current rule [KAOS_fix_ref.JCAP_JHEP]
Question is on which records. Whether you want to spend time on searching or on browsing through references. On test a bibcheck for 999C5s:JHEP* took 15minutes or so just to get the records. In principle we need:
999C5s:JCAP,20*
999C5s:JCAP,1313*
999C5s:JCAP,1414*
999C5s:JCAP,1515*
999C5s:JCAP,1616*
999C5s:JCAP,1717*
999C5s:JCAP,1818*
999C5s:JCAP,1919*
999C5s:JHEP,20*
999C5s:JHEP,1313*
999C5s:JHEP,1414*
999C5s:JHEP,1515*
999C5s:JHEP,1616*
999C5s:JHEP,1717*
999C5s:JHEP,1818*
999C5s:JHEP,1919*
999C5s:J.Stat.Mech.,20*
999C5s:J.Stat.Mech.,1313*
999C5s:J.Stat.Mech.,1414*
999C5s:J.Stat.Mech.,1515*
999C5s:J.Stat.Mech.,1616*
999C5s:J.Stat.Mech.,1717*
999C5s:J.Stat.Mech.,1818*
999C5s:J.Stat.Mech.,1919*
999C5s:/^PTEP.*,\d\d\d$/
999C5s:"MDPI Physics,*"
@ksachs there is a bug
get_citation_for_PTEP() does not define a default for year
, so it can end up using undefined year
2020-02-14 13:42:06 --> Unexpected error occurred: local variable 'year' referenced before assignment.
2020-02-14 13:42:06 --> Traceback is:
2020-02-14 13:42:07 --> * 2020-02-14 13:42:06 -> UnboundLocalError: local variable 'year' referenced before assignment (fix_ref_jhep_volume.py:287:get_citation_for_PTEP)
2020-02-14 13:42:07 --> Frame get_citation_for_PTEP in /scratch/venvs/invenio-legacy/lib/python/invenio/bibcheck_plugins/fix_ref_jhep_volume.py at line 287
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> 284 elif ref_year:
2020-02-14 13:42:07 --> 285 year = ref_year
2020-02-14 13:42:07 --> 286
2020-02-14 13:42:07 --> ----> 287 search_string = r'PTEP[^A-Za-z]*%s[^A-Za-z]*[,: ](\d{2,3}[A-Z]\d{2,3})' % year
2020-02-14 13:42:07 --> 288 search_res = re.search(search_string, text)
2020-02-14 13:42:07 --> 289 if search_res:
2020-02-14 13:42:07 --> 290 true_artid = search_res.group(1)
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> ref_year = "''"
2020-02-14 13:42:07 --> recid_citation = 'None'
2020-02-14 13:42:07 --> text = "' $$HS MIZOGUCHI AND M YATA $$M01 (2013) $$O51'"
2020-02-14 13:42:07 --> journal = "'PTEP'"
2020-02-14 13:42:07 --> volume = "'5'"
2020-02-14 13:42:07 --> ref_pbn = "{'y': '', 'p': 'PTEP', 'c': '053', 'v': '5'}"
2020-02-14 13:42:07 --> debug = 'False'
2020-02-14 13:42:07 --> pubnote = "''"
2020-02-14 13:42:07 --> Frame get_citation_from_pubnote in /scratch/venvs/invenio-legacy/lib/python/invenio/bibcheck_plugins/fix_ref_jhep_volume.py at line 350
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> 347 """
2020-02-14 13:42:07 --> 348
2020-02-14 13:42:07 --> 349 if bug_type == "PTEP":
2020-02-14 13:42:07 --> ----> 350 recid_citation, pubnote = get_citation_for_PTEP(ref_pbn, text, debug)
2020-02-14 13:42:07 --> 351 elif bug_type == 'JHEP':
2020-02-14 13:42:07 --> 352 recid_citation, pubnote = get_citation_for_JHEP(ref_pbn, text, debug)
2020-02-14 13:42:07 --> 353 else:
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> debug = 'False'
2020-02-14 13:42:07 --> text = "' $$hS. Mizoguchi and M. Yata $$m01 (2013) $$o51'"
2020-02-14 13:42:07 --> bug_type = "'PTEP'"
2020-02-14 13:42:07 --> ref_pbn = "{'y': '', 'p': 'PTEP', 'c': '053', 'v': '5'}"
2020-02-14 13:42:07 --> Frame check_record in /scratch/venvs/invenio-legacy/lib/python/invenio/bibcheck_plugins/fix_ref_jhep_volume.py at line 539
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> 536 confirmation_reason = 'RepNo'
2020-02-14 13:42:07 --> 537 if not recid_citation:
2020-02-14 13:42:07 --> 538 recid_citation, pubnote_from_rawref = \
2020-02-14 13:42:07 --> ----> 539 get_citation_from_pubnote(ref_pbn, bug_type, reference['subfields_text'])
2020-02-14 13:42:07 --> 540 confirmation_reason = 'PubNote'
2020-02-14 13:42:07 --> 541 if not recid_citation:
2020-02-14 13:42:07 --> 542 recid_citation, confirmation_reason = \
2020-02-14 13:42:07 --> -------------------------------------------------------------------------------
2020-02-14 13:42:07 --> tickets = 'True'
2020-02-14 13:42:07 --> recid_citation = 'None'
2020-02-14 13:42:07 --> reference = "{'subfield_0': '1204476', 'mark_line': '$$01204476$$9CURATOR$$hS. Mizoguchi and M. Yata$$m01 (2013)$$o51$$sPTEP,5,053', 'doi': '', 'subfields_text': ' $$hS. Mizoguchi and M. Yata $$m01 (2013) $$o51', 'repno': '', 'year': '', 'position_pbn': 5, 'subfield_pbn': 'PTEP,5,053', 'curator': 'C'}"
2020-02-14 13:42:07 --> pubnote_from_rawref = "''"
2020-02-14 13:42:07 --> ref_pbn = "{'y': '', 'p': 'PTEP', 'c': '053', 'v': '5'}"
2020-02-14 13:42:07 --> record = "{'595': [([('9', 'CERN'), ('a', 'CDS-1561297')], ' ', ' ', '', 20)], '773': [([('c', '055016'), ('n', '5'), ('p', 'Phys.Rev.'), ('v', 'D88'), ('y', '2013')], ' ', ' ', '', 37)], '300': [([('a', '15')], ' ', ' ', '', 14)], '999': [([('0', '140422'), ('h', 'P. Ramond'), ('m', '(Sanibel Symposia, 1979), reissued as'), ('o', '1'), ('r', 'hep-ph/9809459'), ('t', 'The Family Group in Grand Unified Theories')], 'C', '5', '', 45), ([('0', '140392'), ('h', 'H. Georgi'), ('o', '2'), ('s', 'Nucl.Phys.,B15 [...]
2020-02-14 13:42:07 --> m999 = "([('0', '1204476'), ('9', 'CURATOR'), ('h', 'S. Mizoguchi and M. Yata'), ('m', '01 (2013)'), ('o', '51'), ('s', 'PTEP,5,053')], 'C', '5', '', 97)"
2020-02-14 13:42:07 --> bug_type = "'PTEP'"
2020-02-14 13:42:07 --> fuzzy = 'False'
2020-02-14 13:42:07 --> recid = "'1242133'"
och manno!
even without fuzzy
option this will change 7930 references. looks ok, though. log attached
looks like the fuzzy option will change more than 15k records
make that over 22k changes with fuzzy
improved preformance
add dealing with PTEP and MDPI Physics
TBD: filter of the bibcheck-rule
Signed-off-by: Kirsten Sachs sachs@l00lnxkaos.desy.de