open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

dominican_republic_api: Incorrect id and date, and tenderers is sometimes an object #1049

Closed jpmckinney closed 7 months ago

jpmckinney commented 7 months ago

For example: ocid ocds-6550wx-TESORERIA NACIONAL-DAF-CM-2022-0062 has a release where bids.details is:

[
  {
    "id": "DO1.RPL.3330807",
    "date": "2022-12-08 15:45:47.8730413",
    "value": {
      "amount": 196000.006
    },
    "status": "Qualified",
    "tenderers": {
      "id": 86075,
      "name": "Transolucion JR, SRL"
    }
  },
  {
    "id": "DO1.RPL.3342770",
    "date": "2022-12-19 16:00:00.0000000",
    "value": {
      "amount": 179334.001829
    },
    "status": "Qualified",
    "tenderers": {
      "id": 235,
      "name": "Santo Domingo Motors Company, SA"
    }
  },
  {
    "id": "DO1.RPL.3331110",
    "date": "2022-12-08 16:41:41.5945438",
    "value": {
      "amount": 221840
    },
    "status": "Qualified",
    "tenderers": {
      "id": 87229,
      "name": "RIF Investment Group, SRL"
    }
  }
]

Another issue: All the releases have the same id TESORERIA NACIONAL-DAF-CM-2022-0062-1 and date 2022-12-07T18:00:34Z. I don't know if this issue is universal or not.

Having the same date means releases can't be reliably ordered, such that the compiled release can be incorrect. (The other fields in the releases are not identical.)

jpmckinney commented 7 months ago

In Kingfisher Process, I'll now handle OCDS Merge errors (i.e. just skip that OCID). It'll log the exception to Sentry and add a note to the collection_note table.

jpmckinney commented 7 months ago

The tenderers issue seems to occur 110 times: https://open-contracting-partnership.sentry.io/issues/4905536293/

yolile commented 7 months ago

@jpmckinney are you logging this here because you expect we resolve this in Collect or only because we don't have a better place to track this?

jpmckinney commented 7 months ago

I usually log issues, and then think of the solutions later.

jpmckinney commented 7 months ago

So, we can close this issue here.

yolile commented 7 months ago

I usually log issues, and then think of the solutions later.

Sounds good, I was just wondering if the issue belongs here or if, in general, we should have a better place to report this type of issue (I'm thinking about the "Data quality" section of the publication in the Registry, for example, e.g. comment on the descriptions document and tag me + the data lead for that publisher so that we can do both, update the registry and let the partner know)

jpmckinney commented 7 months ago

Hmm, my thinking is to report in closest proximity to the source of the issue (the source being the publication, in this case). The closest thing is Kingfisher Collect. Depending on the issue we might fix it in (in decreasing proximity order):

  1. Kingfisher Collect
  2. Kingfisher Process
  3. Some dependency of Kingfisher Process or Kingfisher Collect (recently ocdskit and ocdsextensionregistry)

Unless the problem is on our side (rare), we should in all cases report to the partner, and if not resolved soon, updated in the registry. That said, comments in Docs or rows in Sheets are very hard to track, so I prefer GitHub.

jpmckinney commented 7 months ago

Just a note that Kingfisher Process does NOT have a unique index on collection, ocid, release_id, and so the repetition of the release IDs will not cause any loss of data at that level.

jpmckinney commented 7 months ago

I created the follow-up issue (not prioritized), so this issue can now be closed.

https://github.com/open-contracting/ocds-merge/issues/37