isamplesorg / isamples_inabox

Provides functionality intermediate to a collection and central
0 stars 1 forks source link

SESAR update date not getting updated when data changes #131

Open dannymandel opened 2 years ago

dannymandel commented 2 years ago

When discussing a recently updated SESAR record, @ramdeensarah said:

There was only one with a date in the far past (we had some in the 1880s but I think those were legitimate collection dates). https://app.geosamples.org/sample/igsn/ARF0009GX I changed the record on Feb 16, 2022 around 3:30 PM CT but looking at our system, it did not record the change the last_update_date, which is still listed as 2014-10-21 12:01:13.679042 (I manually changed it in the system. Not sure if I should have changed that date as well?).

Upon inspection of the record (https://api.geosamples.org/v1/sample/igsn-ev-json-ld/igsn/ARF0009GX), we see these dates:

        "log": [
            {
                "type": "registered",
                "timestamp": "2014-10-21 12:01:13"
            },
            {
                "type": "published",
                "timestamp": "2014-10-09 12:00:00"
            },
            {
                "type": "lastUpdated",
                "timestamp": "2014-10-21 12:01:13"
            }
        ],

even though Sarah made the changes to correct the date. So, the lastUpdated date is user editable, but doesn't update by default when a record if modified. This will not work with the iSamples model that assumes the last updated date of the record will change when the data changes, so we may pull only incremental diffs of the data.

Sarah suggests that SESAR may need to introduce a new timestamp that is machine-controlled that will update any time a change is made to the record. This new field would be what iSamples consults when determining which records to update.

dannymandel commented 2 years ago

It looks like the date we examine in iSamples is actually the date of the record in the sitemap:

            elif s.type == "urlset":
                for (loc, ts) in iterloc(s_it, self.sitemap_alternate_links):
                    # ts looks like this: 2018-03-27, shockingly dateparser.parse was very slow on these
                    pieces = ts.split("-")
                    ts_datetime = datetime.datetime(year=int(pieces[0]), month=int(pieces[1]), day=int(pieces[2]), tzinfo=None)
                    for r, c in self._cbs:
                        if r.search(loc) and self.start_from is None or ts_datetime >= self.start_from:

So if there was a compelling reason to leave the JSON-ld as-is, we could just ensure that the sitemap has the update timestamp and we would be good to go on the iSamples side.

ramdeensarah commented 2 years ago

Curator (administrative changes) in records are typically minimal however there are some automated systems within SESAR that change records. One example is our 'transfer ownership' tool which allows you to transfer a sample to another individual.

To check/confirm it does not capture the change in the date fields I registered a sample and transferring it to another account. https://app.geosamples.org/sample/igsn/IESER0001 I found that this process does not change the dates in the SESAR sample table (these fields are embedded in the landing pages in JSON): publish_date, registration_date, and last_update_date. image

This type of administrative data will need to be considered in the iSB design for SESAR.