Open yolile opened 2 years ago
@yolile Can you share an example with Sabah, in case she hasn't seen a good one before?
I have some examples in Spanish but not so many in English. But maybe the UK one is a good one https://www.gov.uk/government/publications/open-contracting and Zambia https://www.zppa.org.zm/ocds-publication-policy
In Spanish some examples are:
Another one in English http://dppib-crsgov.org/publicationpolicy.html
All of the INAI's publications publish a good publication policy, more examples: http://ceaipsinaloa.ddns.net:4000/contratacionesabiertas/politicadepublicacion https://dashboard.infocdmx.org.mx/contratacionesabiertas/politicadepublicacion etc
And this is another example from Honduras https://portalunico.iaip.gob.hn/datosabierto/docs/Pol%C3%ADtica%20de%20publicaci%C3%B3n%20-%20IAIP%20Datos%20Abiertos%20OCDS.pdf
+1 from Sabah via Slack
We'd need to update update_collection_metadata
to either:
publication_policy
field. Manually set the publication policy for existing collections.(2) means that, if at a later date we decide to add another metadata field from Pelican, we won't need to do any manual work.
Let's do (2) then. How should we name this new field? metadata
, pelican_response
, pelican_metadata
, other?
For the record, this is a sample snippet of what pelican currently returns:
{
"url": "The URL where the data can be downloaded isn't presently available.",
"publisher": "Instituto Duranguense de Acceso a la Información Pública y de Protección de Datos Personales",
"extensions": [
],
"ocid_prefix": "ocds-ywf11i",
"data_license": "https://datos.gob.mx/libreusomx",
"published_to": "2021-11-03 19.51.47",
"published_from": "2021-08-04 18.06.41",
"publication_policy": null
}
Note that the only fields that we are missing from pelican are extensions
and publication_policy
, but I'm not sure if we will add more fields to pelican itself in the future. I'm happy with (2), but I'm not sure if we should remove all the other existing columns (ocid_prefix
, date_from
, date_to
, license
) and use the new JSON column instead, to be consistent.
And, we want the publication_policy
to be editable, right? From https://github.com/open-contracting/data-registry/issues/256:
I think this field is currently a property of the job (which makes sense). For the override, we would put it on the publication itself (and we'll need to remember to change the end date periodically).
Should we also put this field (metadata
or publication_policy
) in the publication/collection form?
Let's call it extracted_metadata
.
I'm not sure if we should remove all the other existing columns (ocid_prefix, date_from, date_to, license)
In #256 we want date_to/date_from to be overridable. So we can keep those. I think OCID prefix will always be correct, so we can remove that one.
Similarly, some publishers don't put a license in the package metadata, but they do include one in their docs, so we can leave license_custom
to be overridden. (We can rename it to license
and set db_column="license_custom"
, so that at least within the code the name of the field is consistent.)
Should we also put this field (metadata or publication_policy) in the publication/collection form?
Let's save extracted_metadata
for what's automatically extracted (read only), and then we can add publication_policy
for the override. That way, the editable field will render automatically in the Django admin (not sure if there are good packages for editing structured JSON fields). We can display the extracted_metadata
similar to how the Job context
is rendered, so that admins can see whether they want to override something.
We can add a method to the model that returns the original metadata with any overrides applied, as a new dict. That way, view code can just call instance.metadata.license
and render the license, without worrying about whether it is the original value or not.
In https://github.com/open-contracting/data-registry/issues/256 we want date_to/date_from to be overridable. So we can keep those
Hmm actually, I think that maybe we want the publication policy to be overridable too, I can't remember a specific case but I'm pretty sure that there are some cases where a publication policy exists but is not referenced in the package metadata
Yes, we also want publication_policy to be overridable.
and then we can add publication_policy for the override
Oh, true, sorry, I commented before finishing reading 😸 all good then, I will implement https://github.com/open-contracting/data-registry/issues/192#issuecomment-1326694504
Should we implement:
https://github.com/open-contracting/data-registry/issues/291#issuecomment-2045960744
instead of still using Pelican's metadata? Or should we mark this as blocked until #291 is done?
We can do this without doing #291. Doing this is very easy. Doing #291 is much more work.
Edit: Ah, you mean doing the following to resolve this issue:
From Pelican we get field counts and also some collection metadata. We can get the latter via an HTTP request to Kingfisher Process in the Process task's get_status method (once is_last_completed is true): https://github.com/open-contracting/kingfisher-process/issues/421
Sure, we can get the data from Process instead.
Some publication policies are good and have valuable information for the data users. It will be helpful to include the "Publication Policy" field on the dataset site. We can auto-populate the field with the pelican data or manually if the publisher is not publishing the publication policy link in their JSONs correctly.
@sabahfromlondon what do you think?