open-contracting / cove-ocds

OCDS Data Review Tool
https://ocds-data-review-tool.readthedocs.io
Other
2 stars 3 forks source link

lib-cove-2 and lib-cove-web-2 notes #203

Open jpmckinney opened 8 months ago

jpmckinney commented 8 months ago

From 2023-05-19 email from Duncan.

We've started work on these new libraries in order to add new features in a non-backwards compatible way and to make other improvements. They aren't rewrites as such (they reuse old code) and we'll continue supporting and developing the 'old' libraries for as long as needed so there's no pressure or rush to update.

Some highlights from the new versions:

  • Support for uploading multiple files at once (used for GeoJSON files in OFDS CoVE)
  • Use of a message queue to provide a better experience for users that upload large files and to give more control over server load. This also sets us up to look at cloud platform providers in the future to provide 'burst' processing capacity rather than having a large server that sits idle most of the time.
  • The output of processing is cached by a background worker for fast page loads, but mechanisms exist to invalidate the cache so that new versions of the software can reprocess old items if needed.
  • Refactoring of code that only applies to specific standards to libraries for those standards.
  • Cleaner output from lib-cove-[STANDARD] libraries so the output can be used in multiple tools with each tool being able to tailor UI as needed. This should also help I18N.
  • Clear pipeline architecture - define tasks for each cove. e.g. GeoJSON conversion for OFDS, and sample mode for BODS.

Other CoVE updates

  • Caching, so the results page can be shared (or reloaded) without redoing computation
  • Only show the original file links to admin users, to avoid a malicious user from using cove to host a virus
jpmckinney commented 1 month ago

From 2024-09-28 email from Michael, about adding "queue" functionality without migrating to lib-cove*-2 packages.

The commit adding the "queue" functionality is https://github.com/ThreeSixtyGiving/dataquality/commit/92a3a16fa059c240731bef09383e89b48320e517 [which includes refactoring/tidying].

It uses the fact that lib-cove-web has a small database and that calling the explore page with a parameter starts the processing by looking up the parameters in the database.

So in 360 cove what happens is:

[360 Cove] Form POST [lib-cove-web] As normal the input view creates a database entry for the input data (and downloads the data if needed) that view then redirects to 'explore' named URL (data.get_absolute_url() ) [360 Cove] Re-implement 'explore' named URL's view as a new separate view (cove_360.views.data_loading, name='explore' template: data_loading.html) [360 Cove] In the 'data_loading.html' template there is some JS that does a GET to 'results' which is the old explore page (thus triggering the real processing as explore page used to) [360 Cove] 'data_loading.html' polls a new JSON endpoint which polls the database to get the status of the processing and redirects to the 'results' page (the old explore page) when it's complete (showing a spinner in the meantime) I also added a cache in the mix so that re-visiting the results page doesn't need to re-run the processing which would be annoying for people if they're sharing the results link etc.

This has been in place for 3 years now(!) so it was one of those things that accidentally stuck, however as of last week we've started on a program of improving 360 Cove, one of the tasks I am going to look at is to make this flow less convoluted, my first port of call will be looking at new async api in django to tidy this up. It should be pretty easy as the current cove has all the pieces needed for this and we aren't anywhere near a scale where we need more complicated queue management.

jpmckinney commented 1 month ago

See also: