open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

Italy: Parse JSON instead of HTML #822

Closed jpmckinney closed 3 years ago

jpmckinney commented 3 years ago

The page requests this document: https://www.appaltipop.it/_next/data/LxpUO4Pg-S_nnq33fzaED/it/tenders.json

jpmckinney commented 3 years ago

Looking at this URL, I'm not sure if it's permanent, or if it's generated whenever the page is re-built.

Ravf95 commented 3 years ago

Yes, the data is in this repository, so we can parse tenders.json to get the path for later requests.

jpmckinney commented 3 years ago

Looking at this URL, I'm not sure if it's permanent, or if it's generated whenever the page is re-built.

@yolile Isn't this a problem? (They aren't updating the data, so it's not currently a problem.)

yolile commented 3 years ago

It is hard to know how they are generating that URL, it seems to be permanent as it is the same. However, I just saw on their page that they have an API, and we can use https://www.appaltipop.it/api/v1/buyers instead of https://www.appaltipop.it/_next/data/LxpUO4Pg-S_nnq33fzaED/it/tenders.json to be 100% sure to always get the files list. Another option is to use the GitHub API similar to #821

Ravf95 commented 3 years ago

I think it's a good idea use only the API endpoint to list the files, then we can't download some wrong file.