Upon further investigation this does appear to be a separate issue and occurs with the third entry in ~entry_form.tsv~ form_input.tsv. This is DataSpace ID: 115045. From investigation, it seems as though previous DataSpace records were withdrawn for this publication. Specifically Nos. 115045 and 115161. The DataSpace links are here:
It's unclear to me how often withdrawals occur, but based on my understanding of the workflow, this will always break Poster. Specifically, Scraper does remove such records from entry_form.tsv. However, since form_input.tsv is manually edited (e.g., to include funding information), this does not occur. I believe a cleaner solution is needed to remove non-existing DataSpace records in form_input.tsv without performing a clean fix as that will remove funding information that are manually entered.
Note: The failed CI is with Poster and not Scraper module on an assert. That is, this does address #44:
self.generate_upload_json()
File "Poster.py", line 72, in generate_upload_json
assert len(dspace_data) == 1, dspace_data
AssertionError: []
Error: Process completed with exit code 1.
Upon further investigation this does appear to be a separate issue and occurs with the third entry in ~entry_form.tsv~ form_input.tsv. This is DataSpace ID: 115045. From investigation, it seems as though previous DataSpace records were withdrawn for this publication. Specifically Nos. 115045 and 115161. The DataSpace links are here:
The record to be used, I believe, should be #115162. See: https://dataspace.princeton.edu/handle/88435/dsp01qj72pb21d
It's unclear to me how often withdrawals occur, but based on my understanding of the workflow, this will always break
Poster
. Specifically,Scraper
does remove such records from entry_form.tsv. However, since form_input.tsv is manually edited (e.g., to include funding information), this does not occur. I believe a cleaner solution is needed to remove non-existing DataSpace records in form_input.tsv without performing a clean fix as that will remove funding information that are manually entered.Note: The failed CI is with
Poster
and notScraper
module on anassert
. That is, this does address #44:An empty list was the outcome
Originally posted by @astrochun in https://github.com/pulibrary/dspace-osti/issues/46#issuecomment-897929226