open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

peru_compras: update spider to use new endpoint #1120

Open sentry-io[bot] opened 3 weeks ago

sentry-io[bot] commented 3 weeks ago

Hopefully it's temporary, as we need their data :)

Sentry Issue: REGISTRY-KINGFISHER-COLLECT-1

Gave up retrying <GET https://www.catalogos.perucompras.gob.pe/ConsultaOrdenesPub/DescargaJsonOCDS?pAcuerdo=151&pFechaIni=2017-01-01&pFechaFin=2024-11-01> (failed 3 times): 500 Internal Server Error
yolile commented 2 weeks ago

From https://www.catalogos.perucompras.gob.pe/ConsultaOrdenesPub

The new endpoint seems to be a POST to 'https://www.catalogos.perucompras.gob.pe/ConsultaOrdenesPub/getListaDescargaMasiva' \ --data-raw 'Anio=&Mes='

And then, from the response, download the files

[
  {
    "C_Anio": "2023",
    "CodMes": "01",
    "C_Mes": "Enero",
    "C_Ruta": "contproveedor/DescargaMasiva",
    "C_FileJson": "Datos_Abiertos01022023034435.json",
    "C_FileExcel": "Datos_Abiertos01022023034435.xlsx",
    "C_FileCsv": ""
  },
...
]

And the full URL is, for example: https://saeusceprod01.blob.core.windows.net/contproveedor/DescargaMasiva/Datos_Abiertos01022023034435.json

jpmckinney commented 2 weeks ago

Hmm, the original spider was able to get data up to Oct 2024: https://data.open-contracting.org/en/publication/78

~collect/scrapyd/logs/kingfisher/peru_compras/3dd7eab597e411ef82b6a036bccb3328.log

Edit: The original issue I reported should not have been unavailable, since it did get 84 200 responses, with 34 500s.

Edit2: I suppose we can revert to the original, and keep the new code as a peru_compras_bulk, with the caveat that it is lagging the API.