open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

Update mexico_plataforma_digital_nacional spider #976

Closed yolile closed 1 year ago

yolile commented 1 year ago

https://www.plataformadigitalnacional.org/contrataciones used to have a link to a Google Drive JSON file. Now they point directly to https://datos.gob.mx/busca/dataset/concentrado-de-contrataciones-abiertas-de-la-apf. However, they are also publishing data from other data source now "Secretaría Ejecutiva del Sistema Estatal Anticorrupción de Aguascalientes".

And now, they have an underlying API, for example:

curl 'https://api.plataformadigitalnacional.org/s6/api/v1/search?supplier_id=SHCP' \ -X 'POST'

To get the list of releases by publisher, so we could update the spider to use that API endpoint instead of the Google Drive file.

The only thing to decide is if we want two spiders, one per publisher (e.g., one for SHCP and another one for Aguascalientes) or just one for Plataforma Digital Nacional as of now.

jpmckinney commented 1 year ago

How does PDN disambiguate the two publications? Is its CompraNet data identical to https://datos.gob.mx/busca/dataset/concentrado-de-contrataciones-abiertas-de-la-apf ?

yolile commented 1 year ago

How does PDN disambiguate the two publications?

In the front end, they have a select where you can select the publisher (or, as they call it, "data provider") image

In the API, they have the "supplier_id" query param that accepts SCHP and SESEA_AGS as values.

Is its CompraNet data identical to https://datos.gob.mx/busca/dataset/concentrado-de-contrataciones-abiertas-de-la-apf ?

I don't know, but I can check. Edit, in their FAQ section they say:

¿La PDN va a generar información? No. El objetivo de la PDN es generar interoperabilidad entre los datos que generan los sujetos obligados y es una herramienta de consulta.

So they say they don't generate information, only create interoperability between the different entities, and the platform is only a "query" tool. So I guess we can assume their data from Compranet is identical to what Compranet publishes.

jpmckinney commented 1 year ago

Okay, so let's only add Aguascalientes to Collect and the data registry, since we don't have the upstream source (unless we can find out where it is).

We can add their SHCP data to Collect, but I'm not sure who would want it in favor of the CompraNet data.

jpmckinney commented 1 year ago

I don't know, but I can check

I'd be curious if the ocid is the same, and if so, what ocid is used for Aguascalientes.

yolile commented 1 year ago

I'd be curious if the ocid is the same, and if so, what ocid is used for Aguascalientes.

For Compranet, the prefix is the one registered for SHCP ocds-07smqs, and for Aguascalientes, the one registered for Aguascalientes ocds-ty10ed (we registered that one in the past, so maybe they are sending their data directly to the platform and not publishing it themselves? Aha! from CRM-8104 they said, "we developed the open contracting system for the Plataforma Digital Nacional" so they are indeed only publishing their data through PDN, so new publisher alert? But who? Aguascalientes, I guess?)

jpmckinney commented 1 year ago

Yes, I'd say Aguascalientes. We haven't had a scenario where the publishing platform is independent of the data's author/steward, but I think we want to be tracking the author (despite the field being publisher).

jpmckinney commented 1 year ago

We can maybe add a note to the docs for the CompraNet spider, to indicate that it's also published via PDN. Not sure how useful that fact is for the data registry, but we can also add it to the end of the description.

yolile commented 1 year ago

Ok, so I assume we want one spider for each. And if they add more publishers in the future will add new spiders as we did with Digiwhist, right?

jpmckinney commented 1 year ago

Ok, so I assume we want one spider for each.

I meant the existing CompraNet spider (APF) can just have a note added. We don't need a CompraNet spider for PDN.

Ok, so I assume we want one spider for each. And if they add more publishers in the future will add new spiders as we did with Digiwhist, right?

Yup! Assuming we don't have access to the original source.