Based on analysis by LAC (Andrew's email "Theses Canada - harvesting the University of Alberta repository" received March 10 2022), here is a list of required and other suggested cleanup to be done in preparation for an upcoming harvest of ETDMS theses metadata via OAI.
[Text quoted from LAC email]
Dates
[x] In the dates in five records we get a square bracket as one of the characters. Ere are the dates and the first parts of the titles:
[196 - Man and landscape change in the Banff National Park area before 1911.
[197 - Origins of vagrancy law:
[198 - A study in soil ecology:
[200 - Theoretical and practical biography:
[200 - Use of euphemisms and taboo terms by young speakers of Russian and English
264 (publisher):
[x] 11 titles have “unknown” as publisher and also for degree name and degree grantor. Here are the first parts of the titles:
Theoretical Considerations For Biological Control: (I notice incidentally that this first one may be a duplicate)
Relationality, Reciprocity and the Nature of Self:
Union and Communion:
Posttraumatic Growth and Spirituality:
Modelling Future Impacts of Climate Change and Harvest on the Reproductive Success of Female Polar Bears (Ursus maritimus)
Wolf movement within and beyond the territory boundary
The arrival and establishment of non-indigenous species:
Linear features impact predator-prey encounters:
Modeling group formation and activity patterns in self- organizing communities of organisms
Edmonton Social Planning Council:
How Academic Librarians use Evidence in their Decision Making:
502 (degree info)
[x] Degree information to be corrected:
One thesis has degree name “Sara Victoria Weselake” – the title is: The role of the Prader-Willi syndrome obesity protein, MAGEL2 in the proper functioning of circadian rhythm
2 theses have a discipline as degree name:
Risk construction at a public hearing (has “Organizational Analysis” as degree name)
Women's gendered experiences of rapid resource development in the Canadian North (has “Rural Sociology” as degree name)
Language (change not required for harvest)
[ ] As I scan the data I notice a small number of theses that do not have the language of publication recorded. We can accept the data like this since they will likely be hard to fix. They will be loaded without a language of publication in the MARC record.
Character issues (change not required for harvest)
[ ] I see a very small number of character issues in the abstracts. We see character issues in the data for every university. The issues are hard to fix and so we overlook such problems. When I search “{dollar}” In the .mrc file I get 1171 occurrences in roughly 200 records. It is a very small number. When I search “superscript” I get 56 occurrences in roughly 33 records. It is a very small number. When I search “�” there are too many results to return. But scanning the data it does not seem to be a significant problem.
Abstracts (change not required for harvest)
[ ] Only about 12,436 of the records have abstracts – but that is fine.
Duplicates (change not required for harvest)
[ ] De-duplicating on the title in MARC edit identifies 128 duplicates. This is a very small number. It does not have to be addressed. We normally ignore duplicates – they normally end up on the same MARC record in OCLC – and at that point we get error reports alerting us to those situations and we delete one of them.
Based on analysis by LAC (Andrew's email "Theses Canada - harvesting the University of Alberta repository" received March 10 2022), here is a list of required and other suggested cleanup to be done in preparation for an upcoming harvest of ETDMS theses metadata via OAI.
[Text quoted from LAC email]
Dates
264 (publisher):
How Academic Librarians use Evidence in their Decision Making:
502 (degree info)
Language (change not required for harvest)
Character issues (change not required for harvest)
Abstracts (change not required for harvest)
Duplicates (change not required for harvest)