API download function seems unreliable :(

justinclift commented 2 weeks ago

As part of post-migration testing of things in the new data centre, I'm running some sanity tests to verify things are working ok.

However, it appears the API download() function doesn't work properly with larger sized database files:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National Statistics Postcode Lookup UK.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 43  771M   43  337M  100   442  33.3M     43  0:00:23  0:00:10  0:00:13 34.1M
curl: (18) transfer closed with 454725632 bytes remaining to read

That's happening multiple times in a row with databases of non-trivial size.

The backend isn't showing the problem with any useful detail either:

Jun 18 01:53:16 api systemd[1]: Started dbhub-api.service - DBHub.io API daemon.
Jun 18 01:53:16 api api[535]: 2024/06/18 01:53:16 API server: request log opened: /var/log/dbhub/api.log
[...]
Jun 18 01:54:02 api api[535]: 2024/06/18 01:54:02 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:41960: i/o timeout
Jun 18 01:54:19 api api[535]: 2024/06/18 01:54:19 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:37868: i/o timeout
Jun 18 01:55:08 api api[535]: 2024/06/18 01:55:08 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:54474: i/o timeout
Jun 18 01:56:16 api api[535]: 2024/06/18 01:56:16 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:60412: i/o timeout

It's happening on both the old and new API servers(!), and none of MinIO, Memcached, nor PostgreSQL are displaying any kind of error either.

After the migration is completed we'd better take a look at this.

justinclift commented 2 weeks ago

Note that it doesn't appear to be a lack of resources, at least not cpu (plentiful), disk i/o (plentiful), nor memory. I tripled the ram given to the API VM under test (from 8GB to 24GB) and the failure happens in exactly the same spot.

justinclift commented 2 weeks ago

Live databases are just as bad, timing out after about 20 seconds:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   454    0     0  100   454      0     11  0:00:41  0:00:40  0:00:01     0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   454    0     0  100   454      0     13  0:00:34  0:00:33  0:00:01     0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0

Small databases work fine though:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="Join Testing-live.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16805  100 16384  100   421  24693    634 --:--:-- --:--:-- --:--:-- 25308

$ ls -la stuff.db 
-rw-r--r-- 1 jc jc 16384 Jun 18 12:20 stuff.db

$ file stuff.db 
stuff.db: SQLite 3.x database, last written using SQLite version 3046000, file counter 1, database pages 4, cookie 0x15, schema 4, UTF-8, version-valid-for 1

sqlitebrowser / dbhub.io

API download function seems unreliable :( #375