open-contracting-archive / kingfisher-vagrant

Abandoned as not kept up-to-date with Kingfisher components
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Mexico Grupo Aeroporto Source downloads a list of PDF's #118

Closed odscjames closed 6 years ago

odscjames commented 6 years ago

Found during #100

One of the files this source lists is http://datos.gob.mx/adela/api/v1/organizations/gacm/documents

This just has a small list of PDF documents in a "designation_files" and a "memo_files" field. I don't think this is any part of the standard, or anything we can check?

That being the case, we should just ignore this entry by skipping over it completely?

We can add an if statement, tho there is the slight problem of what do we check? It's hard to be clear as to what bit of data is good to filter against, but without having insight into what they might change in the future there isn't much to go on. In the source, gather_all_download_urls func, we could just check:

if resource['url'] != "http://datos.gob.mx/adela/api/v1/organizations/gacm/documents":

timgdavies commented 6 years ago

This is the old scraper for GACM: https://github.com/open-contracting/sample-data/blob/master/real-examples/mexico/grupo-aeroportuario/fetch.py

As I understand, we are looking at a CKAN organization's datasets (e.g.) and looking for any .json files that are in OCDS format.

If a dataset from that CKAN organization does not have any relevant JSON files, we could skip it.

odscjames commented 6 years ago

Had chat and clarified this is not part of the O.C.D.S Standard, one pull request to ignore this file coming right up ...