Open ksachs opened 6 years ago
correction: all upload files I checked have controlnumber, i.e. recid.
each pair was created from the same workflow right after the other (successive recids) All workflows have an error message in extra_data. Maybe the double upload was triggered by a restart?
arXiv:1808.01257, 1685054, 1685055 001685055 541 $$aarXiv$$chepcrawl$$d2018-08-06T03:35:25.423672$$e1160371 001685054 541 $$aarXiv$$chepcrawl$$d2018-08-06T03:35:25.423672$$e1160371
arXiv:1808.01365, 1685234, 1685235 001685235 541 $$aarXiv$$chepcrawl$$d2018-08-07T03:43:48.991131$$e1161286 001685234 541 $$aarXiv$$chepcrawl$$d2018-08-07T03:43:48.991131$$e1161286
arXiv:1808.01473, 1685232, 1685233 001685233 541 $$aarXiv$$chepcrawl$$d2018-08-07T03:43:50.714420$$e1161331 001685232 541 $$aarXiv$$chepcrawl$$d2018-08-07T03:43:50.714420$$e1161331
Another update that came in while the first record was halted.
Somehow the order of actions might not be right.
The the second worflow (claims to) stop the first only after being halted for match approval.
The first wf continues anyhow, is again stopped for matching and in the end send_to_legacy
.
001688926 037__ $$9arXiv$$aarXiv:1808.05450$$chep-ph
001688926 541__ $$aarXiv$$chepcrawl$$d2018-08-17T03:35:14.401921$$e1177634
001688751 541__ $$aarXiv$$chepcrawl$$d2018-08-18T03:35:02.440396$$e1179088
WorkFlow:1177634
{
"nicename": "\"Halted for matching approval.\"",
"time": "2018-08-17 03:53:26.347351"
},
{
"nicename": "Mark the workflow object with stopped-by-wf:1179088.",
"time": "2018-08-20 15:00:45.534012"
},
....
{
"nicename": "\"Halted for matching approval.\"",
"time": "2018-08-20 15:01:08.732753"
},
{
"doc": "IF_ELSE: args(<function is_fuzzy_match_approved at 0x7f22106d0ed8>, ....
"time": "2018-08-21 07:32:05.250600"
},
....
{
"nicename": "send_to_legacy",
"time": "2018-08-21 07:32:56.497283"
},
"holdingpen_matches": [
1179088
],
WorkFlow:1179088
{
"nicename": "Mark the workflow object with already-in-holding-pen:True.",
"time": "2018-08-18 03:42:35.836757"
},
....
{
"nicename": "\"Halted for matching approval.\"",
"time": "2018-08-18 03:42:36.372337"
},
{
"nicename": "Stop the matched workflow objects in the holdingpen.",
"time": "2018-08-20 15:00:45.712279"
},
....
{
"nicename": "send_to_legacy",
"time": "2018-08-20 15:03:52.980712"
},
....
{
"nicename": "Mark the workflow object with stopped-by-wf:1177634.",
"time": "2018-08-21 07:32:05.442135"
},
"holdingpen_matches": [
1177634
],
trying to trace why arXiv records are created twice.
instead of an update a new record is created
arXiv:1807.07025, 1683196, 1683259 arXiv:1807.06513, 1682949, 1682955
E.g.
The second 1134788 is correctly identified as exact-match. But instead of a replace a new record is created. The creation date of this new record is inherited from the old record.
https://labs.inspirehep.net/api/holdingpen/1134788 contains:
For some reason a new recid is added to
Where is this new recid coming from?? Can it overwrite something?
update comes in while first record is halted
For these I'm not sure I understand the info in the api: arXiv:1807.10190, 1684265, 1684268 arXiv:1807.09872, 1684269, 1684274 arXiv:1807.10163, 1684266, 1684273
What I belive, e.g.
the first is halted for match-approval. While it is halted the second comes in. Now they also somehow match themselves. But both upload files contain no controlnumber, each creating a new record.