Closed jpmckinney closed 2 years ago
From #655
Noting that we could rename all the digiwhist spiders as part of this.
@yolile My proposed guidance:
Lowercase and join the components below with underscores. Replace any spaces with underscores.
For a jurisdiction-specific publication:
- Country name. Do not use acronyms, like "uk". If in doubt, follow `ISO 3166-1 <https://en.wikipedia.org/wiki/ISO_3166-1>`__. For example: Kyrgyzstan, not Kyrgyz Republic. For a non-country like the European Union, use the relevant geography, like "europe".
- Subdivision name. Do not use acronyms, like "nsw". Omit the subdivision type, like "state", unless it is typically included, like in Nigeria. If in doubt, follow `ISO 3166-2 <https://en.wikipedia.org/wiki/ISO_3166-2>`__.
- System name, if needed. Acronyms are allowed, like "agetic".
- Publisher name, if needed. Required if the publisher is not a government.
- Disambiguator, if needed. For example: "historical".
- Access method, if needed: "bulk" or "api".
- OCDS format, if needed: "releases", "records", "release packages" or "record packages".
For a multi-jurisdiction publication:
- Organization name
- Disambiguator
If you create a new base class, omit the components that are not shared, and add "base" to the end. For example, the ``afghanistan_packages_base.py`` file contains the base class for the ``afghanistan_record_packages`` and ``afghanistan_release_packages`` spiders.
Based on scrapy list
, this means we'd need to change:
Before | After |
---|---|
[X] australia_nsw | australia_new_south_wales |
[X] colombia | colombia_api |
[X] digiwhist_* | *_digiwhist |
[X] dominican_republic | dominican_republic_bulk |
[X] nigeria_cross_river_base | nigeria_cross_river_state_base |
[X] nigeria_cross_river_releases | nigeria_cross_river_state_releases |
[X] nigeria_cross_river_records | nigeria_cross_river_state_records |
[X] nigeria_kaduna_state_base | nigeria_kaduna_state_budeshi_base |
[X] nigeria_kaduna_state_records | nigeria_kaduna_state_budeshi_records |
[X] nigeria_kaduna_state_releases | nigeria_kaduna_state_budeshi_releases |
[X] portugal | portugal_bulk |
[X] uk_contracts_finder | united_kingdom_contracts_finder |
[X] uk_fts | united_kingdom_fts |
[X] uk_fts_test | united_kingdom_fts_test |
[X] mexico_infoem | mexico_mexico_infoem |
Do you agree? If so I can make the change now, and update publications in the registry.
We then need to inform the helpdesk, and CDS for when creating new spiders.
Sounds good and consistent for me. Could you also update the documentation to include this convention as part of your changes? eg at https://kingfisher-collect.readthedocs.io/en/latest/contributing/index.html#write-a-spider
We will also need to run the updatedocs
command and add united_kingdom here https://github.com/open-contracting/kingfisher-collect/blob/2e3161e86ad24aa30ebf19edcda56f8cf457972c/kingfisher_scrapy/commands/updatedocs.py#L20
Should georgia_opendata
be renamed to georgia_bulk
?
And georgia_records
and georgia_releases
to georgia_api_records
, georgia_api_releases
?
And similary honduras_portal_records
and honduras_portal_releases
to honduras_portal_api_records
and honduras_portal_api_releases
?
And nepal_portal
to nepal_ppip
?
And nigeria_portal
to nigeria_nocopo
?
And openopps to ?
And I guess we should rename chile_compras_
to just chile_
, and similar for peru, from peru_compras
to peru
or to peru_peru_compras
And pakistan_ppra_releases
to pakistan_ppra_api
And uganda_releases
to uganda
And I guess moldova_old
is an exception
Yes, the above is RST so I can paste it in easily :) I've prepared updatedocs locally as well. I'll make a PR.
georgia_opendata
is a different website/data source than georgia_records
and georgia_releases
, so I think they are fine as-is.
There is typically only one bulk format, even if there are two API formats. If we do this, we'd need to also change chile_compra
and pakistan_ppra
. I guess it is more consistent, and it's just 4 more spiders. What do you think?
Noting that I should check whether any logic depends on the spider name in the data registry (maybe the wiper or exporter?).
Update:
keep_all_data
is checked. (Also, of course, it uses the spider name to schedule crawls.)keep_all_data
is checked.keep_all_data
for all jobs, and run the cbom command.Related: https://github.com/open-contracting/data-registry/issues/154
@jpmckinney ups I was updating my comment, see the updated list of possible changes now.
georgia_opendata is a different website/data source than georgia_records and georgia_releases, so I think they are fine as-is.
That is the same for portugal
too (a website for the bulk and another one for the api)
Suggestion | Comment |
---|---|
georgia_opendata to georgia_bulk, georgia_records to georgia_api_records, georgia_releases to georgia_api_releases | https://odapi.spa.ge and http://opendata.spa.ge are distinct implementations per CRM-7092. They aren't access methods to the same implementation. georgia_opendata isn't used in the registry and might be deleted eventually. With Portugal, they seem to be the same implementation, even if the websites are different. |
honduras_portal_records to honduras_portal_api_records, honduras_portal_releases to honduras_portal_api_releases | OK, and same for chile_compra_releases and chile_compra_records |
nepal_portal to nepal_ppip | Wouldn't it be nepal_ppmo ? Or we can go with nepal if we're renaming it either way. |
nigeria_portal to nigeria_nocopo | No change. "portal" is already in "Nigeria Open Contracting Portal". |
openopps to ? | No change. Follows the rule for "multi-jurisdiction publication". |
chilecompra to chile_, peru_compras to peru or peru_peru_compras | "compra" and "compras" aren't required for disambiguation, but we don't need to be minimal. I think the repetition of "peru" is too weird. I'll add "If a component repeats another, you can omit or abbreviate the component." |
pakistan_ppra_releases to pakistan_ppra_api | OK |
uganda_releases to uganda | "releases" is not required for disambiguation, but we don't need to be minimal. Records endpoints are documented, but they don't work. |
moldova_old is an exception | "old" is a disambiguator. That said, we can maybe rename moldova to moldova_mtender . In any case, moldova_old isn't used in the registry and might be deleted eventually. |
Wouldn't it be nepal_ppmo?
The site's domain (http://ppip.gov.np/) is ppip for "Public Procurement Transparency Initiative in Nepal", although the publisher is PPMO, so either way is fine for me.
chilecompra to chile_, peru_compras to peru or peru_peru_compras
Thinking again, maybe it is better to just leave them as they are, as I know that another national level publisher is thinking of implementing OCDS in Peru, and in Chile too.
PPTIN is used on the website. I don’t know why PPIP is in the URL. It’s not defined anywhere.
On Wednesday, September 15, 2021, Yohanna Lisnichuk < @.***> wrote:
Wouldn't it be nepal_ppmo?
The site's domain (http://ppip.gov.np/) is ppip for "Public Procurement Transparency Initiative in Nepal", although the publisher is PPMO, so either way is fine for me.
chilecompra to chile_, peru_compras to peru or peru_peru_compras
Thinking again, maybe it is better to just leave them as they are, as I know that another national level publisher is thinking of implementing OCDS in Peru, and in Chile too.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-contracting/kingfisher-collect/issues/797#issuecomment-920539688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAGOX5L4EFFOKXEAQUKL2LUCFNN7ANCNFSM5EDNKHZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
--
James McKinney
Head of Data Products and Services
+1-514-247-0223 | @mckinneyjames | skype: mckinney.james | timezone: EST
What’s hot? The G7 endorses open contracting https://www.open-contracting.org/news/g7-commits-to-open-and-participatory-public-procurement-reforms/ and our new Quickstart Guide https://www.open-contracting.org/resources/quickstart-guide/ helps power up your procurement reforms
www.open-contracting.org | follow us @opencontracting
This probably won't happen before the launch of the registry. Ideally, what we should do before launch is https://github.com/open-contracting/data-registry/issues/154 This will make it so that when we change spider names later, it will not break data URLs for users. We will just need to update all publications to use the new spider name. (We can freeze the publications before deploying Kingfisher Collect, so that none of them try to collect data from a non-existent spider.)
Especially once the registry is deployed, it will be difficult to change things.