open-austin / azure-indigent-defense

4 stars 2 forks source link

Host config & add session caching, client caching #17

Closed zimzoom closed 1 year ago

zimzoom commented 1 year ago

This fixes some performance issues, that were necessary in order to run the "big" (5 year) scrape job.

1 - Session and client caching : Check if host already initialized a session, container client, or cosmos db client, and use it if already exists.

2 - host.json changes : Turned off dynamic scaling, which also means that it actually uses the batchSize newBatchThreshold settings, which have been turned way down to avoid overheating our little Consumption Plan hosts.

Big shout to to @normaljosh for also finding and adding a bugfix in the blob parser, which was stripping out ANY characters from the file name that matched "html", not just the start or end "html" as previously intended!