Open dannymandel opened 8 months ago
Debugging in the db:
isb_1=# select * from thing where id='ark:/28722/k2vt23934';
id | tstamp | tcreated | item_type | authority_id | related | log | resolved_url | resolved_status | tresolved | resolve_elapsed |
resolved_content
| resolved_media_type | _id | identifiers | h3
----------------------+------------------------------+------------------------+-----------+--------------+---------+-----+--------------+-----------------+------------------------------+-----------------+------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------+---------------------+---------+--------------------------+-----------------
ark:/28722/k2vt23934 | 2023-11-15 00:58:32.62224+00 | 2023-11-10 18:30:17+00 | sample | OPENCONTEXT | | | | 200 | 2023-11-15 00:58:32.62224+00 | 0 | {"id"
: "http://opencontext.org/subjects/f233b497-13fd-4107-894f-8776f4205d59", "uri": "http://opencontext.org/subjects/f233b497-13fd-4107-894f-8776f4205d59", "href": "https://staging.opencontext.org/subjects/f233b49
7-13fd-4107-894f-8776f4205d59", "icon": "https://staging.opencontext.org/static/oc/icons-v2/object-icon-draft-2.svg", "label": "A080114", "Creator": [{"id": "http://opencontext.org/persons/fc2a17fc-6edb-41d8-89
1e-f231bfd8c23f", "label": "Douglas R Clark"}, {"id": "http://opencontext.org/persons/43220a42-501d-4961-a01c-28d6ec145da7", "label": "Larry G Herr"}], "License": [{"id": "https://creativecommons.org/licenses/b
y/4.0", "label": "Attribution 4.0 International (CC BY 4.0)"}], "context": {"id": "http://opencontext.org/subjects/8c770f83-ca04-b90c-9fa8-dc8548902474", "label": "Asia/Jordan/Tall al-Umayri/Field H/Square 7K22
/Locus H-7K22-064"}, "project": {"id": "http://opencontext.org/projects/2015946b-da6d-4a0f-b676-03bee1efad21", "label": "Madaba Plains Project-`Umayri"}, "updated": "2023-11-10T18:30:17Z", "latitude": 31.868494
612961918, "longitude": 35.887987861855535, "published": "2007-01-01T00:00:00Z", "late bce/ce": -950.0, "citation uri": "https://n2t.net/ark:/28722/k2vt23934", "early bce/ce": -1200.0, "item category": "Object"
, "isam:SamplingSite": {"id": "https://opencontext.org/subjects/38023c67-71a3-44b9-ac47-af7d5672790b", "label": "Tall al-Umayri", "identifier": "https://opencontext.org/subjects/38023c67-71a3-44b9-ac47-af7d5672
790b"}} | application/json | 7766318 | ["ark:/28722/k2vt23934"] | 8f2db3c142c30e9
(1 row)
isb_1=# select resolved_url from thing where id='ark:/28722/k2vt23934';
resolved_url
--------------
(1 row)
isb_1=# select count(*) from thing where resolved_url is NULL;
count
-------
571
(1 row)
It looks like this code is the problem:
# TODO: this is a hack and should ideally be replaced on the OC side.
if next_url is not None:
self.url = next_url.replace("ALL-STANDARD-LD", "iSamples")
else:
self.url = None
In this case, I think we should just keep the current url. The other thing we'll need to double check, however, is to make sure that we properly exit the iterator if we get rid of that condition.
In the meantime, since this field is just for bookkeeping, I think it's safe to set it to a dummy value.
On further analysis, I'm not so sure that that code is problematic. We should have exhausted all the records in the iterator by the time we hit the self.url = None
line.
I've dropped the not null
constraint on the resolved_url
column on iSC. The update on iSC should complete tonight. Note that the database on iSB didn't have the not null
constraint on the column, which is the source of the problem.
I'll leave this open and check tomorrow to ensure the update completes as expected.
I still think there is a bug in the record harvester, however. There are 571
records with no resolved_url
, and look at this:
isb_1=# select max (_id) from thing;
max
---------
7766888
(1 row)
isb_1=# select max(_id) from thing where resolved_url is null;
max
---------
7766888
(1 row)
isb_1=# select min(_id) from thing where resolved_url is null;
min
---------
7766318
(1 row)
7766888 - 7766318 = 570
Certainly feels like the last chunk of records we got just didn't have a resolved_url
associated with them.
Was going through the logs looking at the OpenContext sitemap ingest from iSB to iSC and saw this error: