Closed dannymandel closed 9 months ago
Per discussion with @ekansa, this is expected. @ekansa, if you agree with this assessment could you kindly close the issue? Thank you!
Hm! Interesting, we should have some records from those projects in Open Context.
Here's the base GET request to the Open Context API used by iSamples:
If I add a filter to limit by records in the Giza Botanical Database, the URL would be: https://opencontext.org/query/.json?attributes=iSamples&cat=oc-gen-cat-sample-col%7C%7Coc-gen-cat-bio-subj-ecofact%7C%7Coc-gen-cat-object&cursorMark=%2a&response=metadata,uri-meta&sort=updated--desc,context--asc&type=subjects&rows=100&proj=131-giza-botanical-database
That returns a response with (paged) JSON for 34459 records.
Similarly, for Avkat:
That returns a response to page through 43448 records.
So the Open Context API should be giving records for those projects. I wonder if there's something missing or unexpected in the records for these projects which causes them to be passed over in isamples_inabox?
Thanks @ekansa! It’s possible we have the records but the search is broken. I’ll need to investigate.
I also experimented and requested that the API returned some additional information on facet counts:
The result is expected, where there are facets for the Avkat and the Giza Botanical projects that have the expected counts.
{
"id": "https://opencontext.org/query/?attributes=iSamples&cat=oc-gen-cat-sample-col%7C%7Coc-gen-cat-bio-subj-ecofact%7C%7Coc-gen-cat-object&cursorMark=%2A&proj=117-avkat-archaeological-project&response=prop-facet%2Cmetadata%2Curi-meta&rows=100&type=subjects",
"json": "https://opencontext.org/query/.json?attributes=iSamples&cat=oc-gen-cat-sample-col%7C%7Coc-gen-cat-bio-subj-ecofact%7C%7Coc-gen-cat-object&cursorMark=%2A&proj=117-avkat-archaeological-project&response=prop-facet%2Cmetadata%2Curi-meta&rows=100&type=subjects",
"rdfs:isDefinedBy": "https://opencontext.org/projects/02b55e8c-e9b1-49e5-8edf-0afeea10e2be",
"slug": "117-avkat-archaeological-project",
"label": "Avkat Archaeological Project",
"count": 43448
},
and
{
"id": "https://opencontext.org/query/?attributes=iSamples&cat=oc-gen-cat-sample-col%7C%7Coc-gen-cat-bio-subj-ecofact%7C%7Coc-gen-cat-object&cursorMark=%2A&proj=131-giza-botanical-database&response=prop-facet%2Cmetadata%2Curi-meta&rows=100&type=subjects",
"json": "https://opencontext.org/query/.json?attributes=iSamples&cat=oc-gen-cat-sample-col%7C%7Coc-gen-cat-bio-subj-ecofact%7C%7Coc-gen-cat-object&cursorMark=%2A&proj=131-giza-botanical-database&response=prop-facet%2Cmetadata%2Curi-meta&rows=100&type=subjects",
"rdfs:isDefinedBy": "https://opencontext.org/projects/10aa84ad-c5de-4e79-89ce-d83b75ed72b5",
"slug": "131-giza-botanical-database",
"label": "Giza Botanical Database",
"count": 34459
},
So we may need to dig into how iSamples processes records from these projects.
If I do a keyword search for the data authors, I see their records in iSamples central:
(Giza related):
and (Avkat related):
So it looks like you have the data indexed?
Thanks Eric! That’s super helpful. I suspect something broke as part of our new metadata format and we lost the project info in the iSamples index.
So I think this is the issue:
def produced_by_label(self) -> str:
return self.source_record.get("project label", Transformer.NOT_PROVIDED)
And when I look at one of the records from "Avkat Archaeological Project", I see this:
"project":
{
"id": "http://opencontext.org/projects/02b55e8c-e9b1-49e5-8edf-0afeea10e2be",
"label": "Avkat Archaeological Project"
}
so my guess is the format of the JSON changed but our transformer didn't keep up here, and this is why we are missing the data in solr. Similarly I think we need to update this:
def produced_by_description(self) -> str:
return self.source_record.get("project href", Transformer.NOT_PROVIDED)
to return the id
key out of the project
dictionary.
When I cut over to the new solr index, I found that the following two OpenContext projects seem to have disappeared:
It's unclear where they went.