Closed ribalba closed 3 weeks ago
How many data rows are typically in a frame that is send to the API?
Does it maybe make sense to only send one data row per API request and then even skip the time_stamping data field?
Depending on the configuration. Let's assume 5 second sampling and an upload every 5 minutes. Which is 60 values. I don't think that individual API requests make sense for this case. But individual inserts might. I will need to do some timing checks.
I see.
The way to go seems to be a copy to a temporary table and then do internal conflict checking in the DB.
From my experience single inserts are painfully slow with postgresql. Not recommended. Over the network they become even unbearable, which might happen maybe at a later stage where DB and API are not colocated
Source for temp table code: https://stackoverflow.com/questions/73200153/how-to-ignore-duplicate-keys-using-the-psycopg2-copy-from-command-copying-csv-f
This is done now I think?
The problem is that currently the function takes 11 seconds to complete which triggers a timeout. We need to discuss if we should either 1) go back to row inserts 2) make the depluctation async.
Initially we did a check on every insert but then went to a bulk insert and a check if there are duplicates after the insert. This was faster in the benchmarks where we added loads of keys at once but in real life we only ever add a key or two so I propose to go back to the on insert check.
See discussion here: https://github.com/green-coding-solutions/green-metrics-tool/pull/676#discussion_r1492409659