Now that I am more seriously thinking about scaling this recipe up, I wanted to start a conversation about the performance issues I foresee.
Currently we generate a dictionary of recipes in serial via a for loop. Each recipe makes an API query and then generates some keyword arguments that need to be passed to the recipe generation.
If we want to scale this up to thousands or more of recipes we should spend some time understanding which parts of this machinery take the most time and how we could maybe avoid this.
Some preliminary thoughts:
Can we create the recipes in parallel or asyncronous?
Is there a way to 'batch' requests to the ESGF API.
Or could we work out some sort of 'cache' for recipes, so that only the ones that have changed get rebuilt/need to send an API request (I believe this is not trivially possible, since we need to build the recipe first to compare hashes @cisaacstern ?)
I intend this to be a loose conversation for now, so if anyone has ideas, please feel free to discuss.
Now that I am more seriously thinking about scaling this recipe up, I wanted to start a conversation about the performance issues I foresee.
Currently we generate a dictionary of recipes in serial via a for loop. Each recipe makes an API query and then generates some keyword arguments that need to be passed to the recipe generation.
If we want to scale this up to thousands or more of recipes we should spend some time understanding which parts of this machinery take the most time and how we could maybe avoid this.
Some preliminary thoughts:
I intend this to be a loose conversation for now, so if anyone has ideas, please feel free to discuss.