Open ksachs opened 6 years ago
adding subjects - possible now only via editor. Need input slot in detailed view, input via single letter; final solution: on brief listing too. Records without subject should stay halted Subject guesser would be nice; at DESY subject guessing via authors is in place.
That is already possible on the detailed view, with single letter input.
how? Only for user submissions.
Dealing with online first articles (DOI but no full pubnote): We don't want to curate / manually merge stuff twice. E.g. Elsevier sends an update for every step. 2 possibilities: -- auto-reject (without blocking the following full article). This is what we are currently doing. It is possible to filter these records at DESY before sending xml to labs. -- normal selection + ingest or auto-merge. I.e. do everything that can be done automatically and forget about information that would cause conflicts. Curation should be triggered only for record with full pubnote. The following versions would be matched automatically via DOI, so we don't have to do that again.
If you can filter them out easily, that would be easier. As this kind of things is probably publisher/journal dependent, it's something that we probably want to handle in hepcrawl (when we have non-DESY crawlers), not in the workflow.
I think it's a bug when the list is empty:
599 not in new data-model
Is that a typo? neither me nor @annetteholtkamp know about that field, and we couldn't find any record that has it (actually we found one, but it was a typo for 595). 595 maps to _private_notes
, 595_D maps to _desy_bookkeeping
.
599 is not part of the INSPIRE data-model but it contains comments from the publisher. E.g. from iop info about the conference (it's the only way to identify the conference) or stuff like "This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile." It was a comment from Florian. I don't know if he discussed it with someone.
This is now split in separate issues
Matching and merging is at the beginning of the journal workflow. If we do it on labs we cut off the journal workflow at DESY and have to move everything.
in bold : high priority italic : medium priority normal : nice to have
WARNING: when testing on labs DO NOT ACCEPT/CORE a journal record !!! It will be ingested to INSPIRE and might cause DOI conflicts with the still active DESY workflow.
Things that seem to be working - not fully tested
Matching for Errata
Journals we take completely: (a copy of) record has to stay in HP for subjects and CORE tagging. Same as for other records with the reject button active but grayed-out.
Alert for curation - generate RT tickets / another reason to halt a record in the HP: RT ticket is generated. No 'halted for curation' in HP yet.
Workflow / Processes
in detailed view all 'enrichment' should be possible, no matter why the record is halted: selection (CORE/Accept/Reject), Add/Correct Subject, Resolve Match, more?
number of CORE references needed for selection
adding subjects - possible now only via editor. Need input slot in detailed view, input via single letter; final solution: on brief listing too. Records without subject should stay halted Subject guesser would be nice; at DESY subject guessing via authors is in place.
Adding CNUM - have to think about when and how E.g. how to add the same CNUM to all records https://labs.inspirehep.net/holdingpen/list/?page=1&size=10&q=metadata.publication_info.journal_title:%22J.Phys.Conf.Ser.%22%20AND%20metadata.publication_info.journal_volume:%221029%22
Matching with CNUM - don't know if we need this. We have to see how the matcher performs. If there will be an easy way to merge 2 INSPIRE records (when labs is master) this might not be necessary.
Manual matching If a record is halted, we might notice this is a match. Need a slot to enter the recid. For now it would be enough to have this in detailed view, final solution: on brief listing too. E.g. https://labs.inspirehep.net/holdingpen/1082179 is the erratum for recid:1502058 which we might notice when doing the selection. Currently we would have to accept it as new record and remember to merge on legacy later.
Use information from ADS for matching DOI - arXiv.
Updates to old records, that were rejected - arXiv and Journal: should be auto-rejected - blacklist arXiv, DOI. If this is not in place we have to do something at DESY before sending the xml to labs.
Dealing with online first articles (DOI but no full pubnote):
We don't want to curate / manually merge stuff twice. E.g. Elsevier sends an update for every step. 2 possibilities: -- auto-reject (without blocking the following full article). This is what we are currently doing. It is possible to filter these records at DESY before sending xml to labs. -- normal selection + ingest or auto-merge. I.e. do everything that can be done automatically and forget about information that would cause conflicts. Curation should be triggered only for record with full pubnote. The following versions would be matched automatically via DOI, so we don't have to do that again.
Abstract (and title?) with
<math>
gets truncated Example: https://labs.inspirehep.net/holdingpen/1086582 which gets truncated:GUI / Handling
big search slot on dashboard
at some point we might want to re-organize the boxes on the dashboard. Less distinction between arXiv / Journal
next button on detailed view / how to define the set?
factes for: journal-title, volume affiliation / bucket for lab-curators
Always show facets that are used as filter, so it is possible to take that filter out. Now in case of empty search result there are no facets shown. E.g. https://labs.inspirehep.net/holdingpen/list/?page=1&size=10&workflow_name=HEP&q=!_extra_data.is-update:true&status=HALTED&subject=Astrophysics should always show: Awaiting decision, HEP, Astrophysics Maybe even change the number-of-records shown to the number you get without that filter. I.e. how many records do I get when I click on that facet?
select several records + resolve all we have this for selection, would be good to have this for matching
Batch operation on records with conflicts Currently I can do selection as batch operation on https://labs.inspirehep.net/holdingpen/list/?page=1&size=25&data_type=hep&is-update=true&q=-metadata.acquisition_source.source:arXiv%20AND%20metadata.acquisition_source.method:hepcrawl&status=HALTED&sort=mostrecent That makes no sense - they have been matched. What would make sense is a "forget about the conflicts"
All Match candidates fully visible. First N (~80-100 configurable) characters of the abstract always visible + show all
same layout (width, font, textsize, ...) for new record and match candidates. Basically re-use the brief format. Use color to differentiate.
show report number (report_numbers) on both new record and match candidate in brief and detailed. E.g. https://labs.inspirehep.net/holdingpen/1085815
show conference information on match candidate
make match result visible after decision was made (similar to CORE/accept): "->recid" tag, maybe green/blue depending on auto-match or manual-match Helps to understand what happened to a record E.g. https://labs.inspirehep.net/holdingpen/1082285
open conflicts at the bottom immediately when going to the editor to resolve conflicts
show number of conflicts in brief listing (I don't know why Florian wants this)
Link to fulltext from brief listing
Other fine-tuning of brief display
Bug / Feature
curated flagged as merge conflict
FFT from ftp server not working (just http)
599 not in new data-model
arXiv / arxiv: 2 different soures depending on harvesting via oai or hepcrawl. (minor issue)
Matches not found (examples)
Don't match
Different conferences, e.g. https://labs.inspirehep.net/holdingpen/1086301 https://labs.inspirehep.net/holdingpen/1086289
Different arXiv IDs There are a few cases where we would merge them, but those can be handled manually. The vast majority are false positives and delay harvesting.