inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

workflows: review reference extraction in article workflow #1963

Closed jmartinm closed 7 years ago

jmartinm commented 7 years ago

Current article workflow extracts references for arXiv papers:

ENHANCE_RECORD = [
    # Article Processing
    # ==================
    IF(
        is_arxiv_paper,
        [
            arxiv_fulltext_download,
            arxiv_plot_extract,
            arxiv_refextract,
            arxiv_author_list("authorlist2marcxml.xsl"),
        ]
    ),

TODO:

BONUS POINTS:

rikirenz commented 7 years ago

@jmartinm Do we have tests for this kind of extraction?

jmartinm commented 7 years ago

I don't think so, the only thing I see is this mock in the integration test for workflows.

Unit tests are needed for https://github.com/inspirehep/inspire-next/blob/d0cb6ba6d761279a76b849990dadd8160156eecb/inspirehep/modules/refextract/tasks.py#L86

kaplun commented 7 years ago

But, aren't we anyway discarding references before sending to legacy?

jmartinm commented 7 years ago

But, aren't we anyway discarding references before sending to legacy?

It seems so https://github.com/inspirehep/inspire-next/blob/master/inspirehep/modules/workflows/workflows/article.py#L278

I guess they were just extracted to show something in the Holding Pen that helps decide if the article is core/non-core.

Question on this PR was to know if we want to keep doing that or we can extract the references on Labs (and possibly as well the ones given as free text by the user) and send them to legacy.

kaplun commented 7 years ago

Nope we can't yet. We need to port refextract to use journal from Labs, but for the time being this has not yet been done: https://github.com/inspirehep/refextract/issues/3

david-caro commented 7 years ago

@jmartinm if nobody is doing this, move to ready and unassign so anyone can pick it up :+1:

jacquerie commented 7 years ago

This has happened in https://github.com/inspirehep/inspire-next/pull/2558.