open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

Italian Ministry of Infrastructure and Transport #1077

Closed yolile closed 4 months ago

yolile commented 5 months ago

Links to bulk download and/or API documentation https://www.serviziocontrattipubblici.it/ocds-ms/swagger-ui.html#/v1.0

Links to bulk download and/or API endpoints E.g. https://www.serviziocontrattipubblici.it/ocdsReleasePackages-ms/v1.0/ocdsReleasePackages?dataInvioA=2022-12-31&dataInvioDa=2022-01-01&page=1&pageSize=5 (See API documentation)

Priority anytime

Data structure Release packages with small validation issues, e.g. incorrect URL format and a typo documenttype instead of documentType

Publication format Release packages

If the publication passes the basic criteria, please follow the steps described at the process note. @allakulov please check

yolile commented 5 months ago

Note that they are not returning the page number, total count, or link to the next page as part of the response, so there is no way other than iterate until the response is a package without the releases key to knowing if we have crawled all the data, e.g. https://www.serviziocontrattipubblici.it/ocdsReleasePackages-ms/v1.0/ocdsReleasePackages?dataInvioA=2022-12-31&dataInvioDa=2022-12-01&page=4000&pageSize=20

jpmckinney commented 4 months ago

We could do exponential search (similar to Armenia) to find the last page, but it's maybe not worth it.