surfedushare / search-portal

A search service for finding open access higher education learning materials
MIT License
2 stars 1 forks source link

Fix publication date for WUR #759

Closed fako closed 1 year ago

fako commented 1 year ago

Pivotal

The L4L XML data does not contain a "publisher" field in the "contribute" section, so the harvester can not determine a publication date (https://github.com/surfedushare/search-portal/blob/acceptance/harvester/edurep/extraction.py#L231-L242)

Solution is to also look for a date in the contribute field (after checking for existence of a publisher field)

fako commented 1 year ago

Line of code to check the counts for successful L4L date extraction:

Document.objects.filter(dataset_version=dv, collection__name="l4l").exclude(properties__publisher_date=None).count()
fako commented 1 year ago

From 1 to 720 publisher_dates for L4L set.