Closed devsjc closed 3 months ago
Initial modification with https://github.com/openclimatefix/nwp-consumer/pull/134. This does not change the business language of the consumer for the moment for ease of testing and migration. I do think that the business language (and reflecting methods) of the NWPConsumer class however should pivot to the SingleInitTime
variants considering this seems to be the primary use case of the consumer.
The initial spec of the consumer required it to download multiple days of data as a first class citizen. These datasets would contain many init times each with their own sets of files. As such in order to make that as quick a process as possible, I build the consumer to parallelise with dask across desired init times - so each was processed in parallel.
However, real-world usage of the consumer has indicated that we very rarely if ever download and convert multiple init times at once - instead, choosing to download single init times at a time and iterating through that. Additionally to this, sources such as ICON have a great many number of files per init time, the downloads of which as it stands don't gain the benefit of parallel computing.
As such I propose a refactor in the business logic of the consumer - to make
DownloadSingleInitTime
andConvertSingleInitTime
the new service first-class citizens (business use cases). This then enables the service to use dask parallelise within each init time and speed up the consumer in the way it is most-regularly used.Before (verticality indicates parallel):
After:
IT2
is shown here for clarity, but as mentioned, most often the use case for the consumer is just a single init time, hence why the second option is the preferred new choice.