zhaofengli / attic

Multi-tenant Nix Binary Cache
https://docs.attic.rs
Other
1.02k stars 77 forks source link

Database error: Failed to acquire connection from pool #24

Open kubukoz opened 1 year ago

kubukoz commented 1 year ago

Hi! Thank you for working on this. I was giving Attic a try but on some uploads (notably, this one is several hundred MBs) it's failing to push.

I'm getting this error:

Feb 09 17:05:16 ip-10-193-26-190 atticd[787]: 2023-02-09T17:05:16.952504Z ERROR attic_server::error: Database error: Failed to acquire connection from pool

I'm using the default db, which I assume is a local sqlite... so it seems like a weird error to see. It's pretty reproducible.

My config (mostly the defaults from the docs, really):

{
  services.atticd = {
    enable = true;
    credentialsFile = "/etc/nixos/attic-creds.env";

      settings = {
      listen = "[::]:8080";

      # Data chunking
      #
      # Warning: If you change any of the values here, it will be
      # difficult to reuse existing chunks for newly-uploaded NARs
      # since the cutpoints will be different. As a result, the
      # deduplication ratio will suffer for a while after the change.
      chunking = {
        # The minimum NAR size to trigger chunking
        #
        # If 0, chunking is disabled entirely for newly-uploaded NARs.
        # If 1, all NARs are chunked.
        nar-size-threshold = 64 * 1024; # 64 KiB

        # The preferred minimum size of a chunk, in bytes
        min-size = 16 * 1024; # 16 KiB

        # The preferred average size of a chunk, in bytes
        avg-size = 64 * 1024; # 64 KiB

        # The preferred maximum size of a chunk, in bytes
        max-size = 256 * 1024; # 256 KiB
      };
    };
  };

Client error:

✅ zlaczlhsx8s4d54ksgpszxwnpqqqf7hn-corretto17-17.0.5.8.1 (1.51 MiB/s, 99.7% deduplicated)
❌ v9mvqn3sa39pjdcszi19c8hpp7wlizca-source: InternalServerError: The server encountered an internal error or misconfiguration.
Error: InternalServerError: The server encountered an internal error or misconfiguration.
Error: Process completed with exit code 123.

Full logs of the run:

Feb 09 16:47:59 localhost atticd[787]: Attic Server 0.1.0 (release)
Feb 09 16:47:59 localhost atticd[787]: Running migrations...
Feb 09 16:47:59 localhost atticd[787]: Starting API server...
Feb 09 16:47:59 localhost atticd[787]: Listening on [::]:8080...
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]: 2023-02-09T17:05:16.952504Z ERROR attic_server::error: Database error: Failed to acquire connection from pool
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:    0: tokio::task::runtime.spawn
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:            with kind=task task.name= task.id=45603 loc.file="server/src/api/v1/upload_path.rs" loc.line=406 loc.col=13
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.23.0/src/util/trace.rs:16
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:    1: attic_server::api::v1::upload_path::upload_path
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:              at server/src/api/v1/upload_path.rs:115
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:    2: tower_http::trace::make_span::request
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:            with method=PUT uri=/_api/v1/upload-path version=HTTP/1.1
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.3.5/src/trace/make_span.rs:116
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:    3: tokio::task::runtime.spawn
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:            with kind=task task.name= task.id=8720 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.23/src/common/exec.rs" loc.line=49 loc.col=21
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.23.0/src/util/trace.rs:16
Feb 09 17:05:16 ip-10-193-26-190 atticd[787]: 2023-02-09T17:05:16.953005Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=257570 ms
kubukoz commented 1 year ago

Forgot to post the actual question :D

Do you have any hints as to why this could be happening or what I can do to alleviate the issue?

zhaofengli commented 1 year ago

That's interesting. I just tried a 8GB NAR and it uploaded fine. How busy was the server at the time?

kubukoz commented 1 year ago

Just handling this one client. It was 10 jobs, though...

It seems like after some retries we got it to pass.

gador commented 1 year ago

I got the same errors with sqlite:

2023-02-12T20:10:00.892876Z ERROR attic_server::error: Database error: Failed to acquire connection from pool
   0: tokio::task::runtime.spawn
           with kind=task task.name= task.id=2362343 loc.file="server/src/api/v1/upload_path.rs" loc.line=406 loc.col=13                                                                                                  at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.23.0/src/util/trace.rs:16
   1: attic_server::api::v1::upload_path::upload_path
             at server/src/api/v1/upload_path.rs:115
   2: tower_http::trace::make_span::request
           with method=PUT uri=/_api/v1/upload-path version=HTTP/1.0
             at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.3.5/src/trace/make_span.rs:116
   3: tokio::task::runtime.spawn                                                                                                                                                                                        with kind=task task.name= task.id=2348982 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.23/src/common/exec.rs" loc.line=49 loc.col=21
             at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.23.0/src/util/trace.rs:16
2023-02-12T20:10:00.892951Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=451119 ms

I see a 100% usage of my SSD (where the sqlite is saved on, not where the data is on) when uploading. Uploads take a long time. But AFAIS the network part is short. The network usage spikes and then idles, whereas the SSD is under load. Some of the (bigger) uploads fail with the above error.

I switched to postgres and the error disappears.

I guess the sqlite database is not suitable for more than just a few tests..

NULLx76 commented 1 year ago

I encountered the same errors, similarly switching to postgres seems to fix it

Jun 05 14:10:35 attic atticd[2788]: 2023-06-05T12:10:35.369673Z ERROR attic_server::error: Database error: Failed to acquire connection from pool
Jun 05 14:10:35 attic atticd[2788]:    0: tokio::task::runtime.spawn
Jun 05 14:10:35 attic atticd[2788]:            with kind=task task.name= task.id=35784 loc.file="server/src/api/v1/upload_path.rs" loc.line=408 loc.col=13
Jun 05 14:10:35 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:10:35 attic atticd[2788]:    1: attic_server::api::v1::upload_path::upload_path
Jun 05 14:10:35 attic atticd[2788]:              at server/src/api/v1/upload_path.rs:117
Jun 05 14:10:35 attic atticd[2788]:    2: tower_http::trace::make_span::request
Jun 05 14:10:35 attic atticd[2788]:            with method=PUT uri=/_api/v1/upload-path version=HTTP/1.1
Jun 05 14:10:35 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.4.0/src/trace/make_span.rs:116
Jun 05 14:10:35 attic atticd[2788]:    3: tokio::task::runtime.spawn
Jun 05 14:10:35 attic atticd[2788]:            with kind=task task.name= task.id=14446 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.26>
Jun 05 14:10:35 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:10:35 attic atticd[2788]: 2023-06-05T12:10:35.370482Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=413940 ms
Jun 05 14:13:44 attic atticd[2788]: 2023-06-05T12:13:44.420429Z ERROR attic_server::error: Database error: Failed to acquire connection from pool
Jun 05 14:13:44 attic atticd[2788]:    0: tokio::task::runtime.spawn
Jun 05 14:13:44 attic atticd[2788]:            with kind=task task.name= task.id=35771 loc.file="server/src/api/v1/upload_path.rs" loc.line=408 loc.col=13
Jun 05 14:13:44 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:13:44 attic atticd[2788]:    1: attic_server::api::v1::upload_path::upload_path
Jun 05 14:13:44 attic atticd[2788]:              at server/src/api/v1/upload_path.rs:117
Jun 05 14:13:44 attic atticd[2788]:    2: tower_http::trace::make_span::request
Jun 05 14:13:44 attic atticd[2788]:            with method=PUT uri=/_api/v1/upload-path version=HTTP/1.1
Jun 05 14:13:44 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.4.0/src/trace/make_span.rs:116
Jun 05 14:13:44 attic atticd[2788]:    3: tokio::task::runtime.spawn
Jun 05 14:13:44 attic atticd[2788]:            with kind=task task.name= task.id=30749 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.26>
Jun 05 14:13:44 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:13:44 attic atticd[2788]: 2023-06-05T12:13:44.420936Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=339846 ms
Jun 05 14:30:40 attic atticd[2788]: 2023-06-05T12:30:40.187559Z ERROR attic_server::error: Database error: Failed to acquire connection from pool
Jun 05 14:30:40 attic atticd[2788]:    0: tokio::task::runtime.spawn
Jun 05 14:30:40 attic atticd[2788]:            with kind=task task.name= task.id=76255 loc.file="server/src/api/v1/upload_path.rs" loc.line=408 loc.col=13
Jun 05 14:30:40 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:30:40 attic atticd[2788]:    1: attic_server::api::v1::upload_path::upload_path
Jun 05 14:30:40 attic atticd[2788]:              at server/src/api/v1/upload_path.rs:117
Jun 05 14:30:40 attic atticd[2788]:    2: tower_http::trace::make_span::request
Jun 05 14:30:40 attic atticd[2788]:            with method=PUT uri=/_api/v1/upload-path version=HTTP/1.1
Jun 05 14:30:40 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.4.0/src/trace/make_span.rs:116
Jun 05 14:30:40 attic atticd[2788]:    3: tokio::task::runtime.spawn
Jun 05 14:30:40 attic atticd[2788]:            with kind=task task.name= task.id=61183 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.26>
Jun 05 14:30:40 attic atticd[2788]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.28.2/src/util/trace.rs:16
Jun 05 14:30:40 attic atticd[2788]: 2023-06-05T12:30:40.187967Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=304195 ms
voidus commented 1 year ago

Had the same error with the db in an sshfs mounted folder, moving it out of there fixed it for me

JulienMalka commented 11 months ago

I have the same problem

Nebucatnetzer commented 6 months ago

I'm seeing the same problem, mostly when pushing the Nvidia drivers which is about 500MB in size. The server is hosted on a Raspberry Pi 4 with Postgres for the DB and the actual files are stored on an NFS share. Sure it has to work on it but it's not like it's 100% pinned like it is compiling something.

nixos via ❄️  impure (nixosbuildshell-env) on  master ❯ attic push -j 1 prod /run/current-system
⚙️ Pushing 3 paths to "prod" on "2li" (505 already cached, 2424 in upstream)...
❌ dnzqdd04mk1i4c3lfq1g2dx20mp5cb84-nvidia-x11-545.29.02-6.1.90: error sending request for url (https://cache.zweili.org/_api/v1/upload-path): connection error: unexpected end of file
✅ p3ac2311nbr988jh054vg9xl6i9adjww-SideQuest-usr-target (2.88 MiB/s, 100.0% deduplicated)
✅ 041mqxjdqcv9iflbc94gk12xmp2g7zq6-nvidia-x11-545.29.02-6.1.90-bin (13.65 MiB/s, 100.0% deduplicated)
Error: error sending request for url (https://cache.zweili.org/_api/v1/upload-path): connection error: unexpected end of file
2024-05-09T18:19:53.515913Z ERROR attic_server::error: Database error: Failed to acquire connection from pool: Connection pool timed out
May 09 20:19:53 management atticd[1081181]:    0: tokio::task::runtime.spawn
May 09 20:19:53 management atticd[1081181]:            with kind=task task.name= task.id=947551 loc.file="server/src/api/v1/upload_path.rs" loc.line=408 loc.col=13
May 09 20:19:53 management atticd[1081181]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.34.0/src/util/trace.rs:17
May 09 20:19:53 management atticd[1081181]:    1: attic_server::api::v1::upload_path::upload_path
May 09 20:19:53 management atticd[1081181]:              at server/src/api/v1/upload_path.rs:117
May 09 20:19:53 management atticd[1081181]:    2: tower_http::trace::make_span::request
May 09 20:19:53 management atticd[1081181]:            with method=PUT uri=/_api/v1/upload-path version=HTTP/1.1
May 09 20:19:53 management atticd[1081181]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tower-http-0.4.4/src/trace/make_span.rs:109
May 09 20:19:53 management atticd[1081181]:    3: tokio::task::runtime.spawn
May 09 20:19:53 management atticd[1081181]:            with kind=task task.name= task.id=943769 loc.file="/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/hyper-0.14.27/src/common/exec.rs" loc.line=49 loc.col=21
May 09 20:19:53 management atticd[1081181]:              at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/tokio-1.34.0/src/util/trace.rs:17
May 09 20:19:53 management atticd[1081181]: 2024-05-09T18:19:53.516222Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=45110 ms
Nebucatnetzer commented 5 months ago

The related logs for postgres I'm seeing:

ay 15 21:46:45 management postgres[2518741]: [2518741] ERROR:  insert or update on table "chunkref" violates foreign key constraint "fk_chunkref_nar"
May 15 21:46:45 management postgres[2518741]: [2518741] DETAIL:  Key (nar_id)=(7524) is not present in table "nar".
May 15 21:46:45 management postgres[2518741]: [2518741] STATEMENT:  INSERT INTO "chunkref" ("nar_id", "seq", "chunk_id", "chunk_hash", "compression") VALUES ($1, $2, $3, $4, $5) RETURNING "id"
May 15 21:46:45 management postgres[2518782]: [2518782] ERROR:  insert or update on table "chunkref" violates foreign key constraint "fk_chunkref_nar"
May 15 21:46:45 management postgres[2518782]: [2518782] DETAIL:  Key (nar_id)=(7524) is not present in table "nar".
May 15 21:46:45 management postgres[2518782]: [2518782] STATEMENT:  INSERT INTO "chunkref" ("nar_id", "seq", "chunk_id", "chunk_hash", "compression") VALUES ($1, $2, $3, $4, $5) RETURNING "id"
May 15 21:46:45 management postgres[2518746]: [2518746] ERROR:  insert or update on table "chunkref" violates foreign key constraint "fk_chunkref_nar"
May 15 21:46:45 management postgres[2518746]: [2518746] DETAIL:  Key (nar_id)=(7524) is not present in table "nar".
May 15 21:46:45 management postgres[2518746]: [2518746] STATEMENT:  INSERT INTO "chunkref" ("nar_id", "seq", "chunk_id", "chunk_hash", "compression") VALUES ($1, $2, $3, $4, $5) RETURNING "id"