Open justinclift opened 2 weeks ago
Note that it doesn't appear to be a lack of resources, at least not cpu (plentiful), disk i/o (plentiful), nor memory. I tripled the ram given to the API VM under test (from 8GB to 24GB) and the failure happens in exactly the same spot.
Live databases are just as bad, timing out after about 20 seconds:
$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 454 0 0 100 454 0 11 0:00:41 0:00:40 0:00:01 0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0
$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 454 0 0 100 454 0 13 0:00:34 0:00:33 0:00:01 0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0
Small databases work fine though:
$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="Join Testing-live.sqlite" -o stuff.db https://api.dbhub.io/v1/download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 16805 100 16384 100 421 24693 634 --:--:-- --:--:-- --:--:-- 25308
$ ls -la stuff.db
-rw-r--r-- 1 jc jc 16384 Jun 18 12:20 stuff.db
$ file stuff.db
stuff.db: SQLite 3.x database, last written using SQLite version 3046000, file counter 1, database pages 4, cookie 0x15, schema 4, UTF-8, version-valid-for 1
As part of post-migration testing of things in the new data centre, I'm running some sanity tests to verify things are working ok.
However, it appears the API download() function doesn't work properly with larger sized database files:
That's happening multiple times in a row with databases of non-trivial size.
The backend isn't showing the problem with any useful detail either:
It's happening on both the old and new API servers(!), and none of MinIO, Memcached, nor PostgreSQL are displaying any kind of error either.
After the migration is completed we'd better take a look at this.