open-contracting / data-registry

BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Replace Pelican with Cardinal in pipeline (and make detailed coverage available) #291

Open jpmckinney opened 1 year ago

jpmckinney commented 1 year ago

We are only using Pelican for field coverage, for which Cardinal is much faster.

We can store the output as part of the job, and make it available as part of the API in #268. We can also consider designing a report for the dataset's page, where a user can opt to view the detailed coverage.

We can then also use the output to either report:

jpmckinney commented 1 year ago

Edit: Moved tangential comment to #292

jpmckinney commented 7 months ago

From Pelican we get field counts and also some collection metadata. We can get the latter via an HTTP request to Kingfisher Process in the Process task's get_status method (once is_last_completed is true): https://github.com/open-contracting/kingfisher-process/issues/421

jpmckinney commented 6 months ago
sentry-io[bot] commented 1 month ago

Sentry Issue: REGISTRY-PELICAN-FRONTEND-B

jpmckinney commented 1 month ago

I linked a Sentry issue where a Pelican API request is quite slow on some collections (20s).