psu-libraries / researcher-metadata

Penn State University's faculty and research metadata repository
https://metadata.libraries.psu.edu/
MIT License
7 stars 0 forks source link

Accepted versions are getting labeled as published versions in AI OA Workflow #983

Open anaelizabethenriquez opened 6 months ago

anaelizabethenriquez commented 6 months ago

Lucretia has noticed quite a few files in the "Review Wrong File Versions" bucket where the "Version" field on the file is incorrect. As of now, these are all marked as "publishedVersion," but they're really the accepted version.

https://metadata.libraries.psu.edu/admin/activity_insight_oa_file/2049 https://metadata.libraries.psu.edu/admin/activity_insight_oa_file/1838 https://metadata.libraries.psu.edu/admin/activity_insight_oa_file/1837 https://metadata.libraries.psu.edu/admin/activity_insight_oa_file/2217 https://metadata.libraries.psu.edu/admin/activity_insight_oa_file/791

These files have a lot in common. They are all PDFs created with LaTeX, and they have the arXiv watermark on them. I suspect there is something in the automated version checker that is mistakenly categorizing these as published. Guessing this will require some development time to fix and needs to wait until fall, but giving @ajkiessl a heads up just in case.

Ideally I'd like to capture whatever we'll need for development in this issue so we can proceed with correcting this error manually for the files we have now.

ajkiessl commented 6 months ago

Yeah, we'll have to parse out the metadata for these to see what's going out. Will require development time.

anaelizabethenriquez commented 6 months ago

Thanks, @ajkiessl! I'm attaching the files to this issue. Anything else I should save so we can troubleshoot this later?

2107.09295-1.pdf 2104.09026-1.pdf 2004.01762-1.pdf 2002.05874-1-1.pdf 0808.3012-1-3.pdf

ajkiessl commented 6 months ago

I think the records and files should be enough