Closed iulianav closed 7 years ago
@michamos now that the 'submission_number' field is going to be later populated for hepcrawl records as well, the schema is no longer accurate, nor the name of the field. Right?
you are right, we could rename it to holdingpen_record
and let it be a json_reference
to the holdingpen record.
@david-caro
Expected Behavior
In order to ensure the schema is respected, the LiteratureBuilder defined in inspire-schemas should be used in order to generate the 'acquisition_source' and the 'submission_number' field should not be populated for hepcrawl ingested records -- -- check: http://inspire-schemas.readthedocs.io/en/latest/schemas/records/elements/acquisition_source.html#acquisition-source-json ..
Current Behavior
Generating the 'acquisition_source' is done in some places https://github.com/inspirehep/hepcrawl/blob/e749a26ca9b77f61c5abb20b63e295a4f75a6508/hepcrawl/tohep.py#L229-L231 correctly via the LiteratureBuilder and in some other places https://github.com/inspirehep/hepcrawl/blob/e749a26ca9b77f61c5abb20b63e295a4f75a6508/hepcrawl/tohep.py#L142-L149 by using a function that does a similar thing. This function also incorrectly populates the 'submission_number' field for hepcrawl records with the SCRAPY_JOB id.
Steps to Reproduce (for bugs)
Context
Screenshots (if appropriate):