google / weather-tools

Tools to make weather data accessible and useful.
https://weather-tools.readthedocs.io/
Apache License 2.0
208 stars 39 forks source link

Strengthen feature collection ingestion logic in weather-mv #385

Open deepgabani8 opened 1 year ago

deepgabani8 commented 1 year ago

The story / Current behavior In the last step of weather-mv pipeline, the ingestion transform, each worker is ingesting its particular feature-collection. Our logic ensures that there is a room for a feature collection ingestion task in the earth engine task list & for that, each worker is checking if the task list has a room for its particular feature collection, and it is checking continuously, at a certain interval. This is the point, where we can optimize.

Suggested changes In the ingestion transform, when it's feature collections, use flatMap to list all the feature collections coming from the above transform. Then pass this list as a single item to a ParDo function (meaning: run just a single worker) and this worker is now responsible to ingest all the feature collections. It does the same thing that current logic does. The benefit is, instead of all the workers checking for a room in the EE task list, only one worker is checking. Additionally, we can make the waiting time even more, and when it finds room, it creates tasks at once. For instance, the worker checks at one minute, and if it finds 5 vacancy, it creates 5 tasks at once, now that it has access to the entire feature collection list.

Benefits