Open dannymandel opened 3 months ago
It looks like the problem is this code:
def _add_responsibilities_to_container(self,
rec: dict,
responsibility_key_solr: str,
responsibility_key: str,
container: dict):
responsibilities = rec.get(responsibility_key_solr, [])
responsibility_dicts = []
for responsibility in responsibilities:
pieces = responsibility.split(":")
responsibility_dicts.append({METADATA_ROLE: pieces[0], METADATA_NAME: pieces[1]})
if len(responsibility_dicts) > 0:
container[responsibility_key] = responsibility_dicts
def _curation_dict(self, rec: dict) -> dict:
curation_dict: dict = {}
self._add_to_dict(curation_dict, METADATA_LABEL, rec, SOLR_CURATION_LABEL)
self._add_to_dict(curation_dict, METADATA_DESCRIPTION, rec, SOLR_CURATION_DESCRIPTION)
self._add_to_dict(curation_dict, METADATA_CURATION_LOCATION, rec, SOLR_CURATION_LOCATION)
self._add_responsibilities_to_container(rec, SOLR_CURATION_RESPONSIBILITY, METADATA_RESPONSIBILITY, curation_dict)
access_constraints = rec.get(SOLR_CURATION_ACCESS_CONSTRAINTS, "").split("|")
if len(access_constraints) > 0:
curation_dict[METADATA_ACCESS_CONSTRAINTS] = access_constraints
return curation_dict
Note that it's trying to split the string value based one a role:value
format, and this record doesn't conform to that. It's unclear what we should do in this case.
It looks like the value in the solr index doesn't match the current code. The current code is doing this:
def curation_responsibility(self) -> list[dict[str, str]]:
curation_str = f"{self.source_record.get('institutionCode')} {self.source_record.get('institutionID')}"
return [Transformer._responsibility_dict("curator", curation_str)]
And the responsibility dict just does:
@staticmethod
def _responsibility_dict(
role: str, name: str
):
return {METADATA_ROLE: role, METADATA_NAME: name}
So in the solr index it should be:
curator: "USNM http://grbio.org/cool/142r-0w94"
So I think we can default the role to curator
if it's a naked string.
It looks like
curationResponsibility
is missing from the regenerated solr index on my local Mac. Upon a little digging, it isn't present in the exported .jsonl file on henry.Record id: http://localhost:8984/solr/isb_core_records/select?indent=true&q.op=OR&q=id%3A%22ark%3A%2F65665%2F300008335-8d74-4c3f-873c-a9d8b4b3d6a8%22&useParams=
curl "https://henry.cyverse.org/smithsonian/sitemaps/sitemap-0.jsonl" | grep -i a9d8b4b3d6a8