imdrasil / jennifer.cr

Crystal ORM using ActiveRecord pattern with flexible query DSL
MIT License
418 stars 55 forks source link

Slow performance when creating new records #432

Open fdocr opened 1 year ago

fdocr commented 1 year ago

I've been running some tests where I insert new records of a Turn model on an endpoint and I'm seeing degraded performance. I'm opening this issue where I would hope maybe there's a way to work around this, or perhaps an opportunity to identify and fix this problem for the shard. I implemented the same DB logic with amberframework/granite to compare and the performance is much better there. I prefer jennifer overall for a couple of reasons, so it would be great to find a solution for this issue.

class Turn < Jennifer::Model::Base
  with_timestamps

  mapping(
    id: Primary64,
    game_id: String,
    snake_id: String,
    context: String,
    path: String,
    dead: Bool,
    created_at: Time?,
    updated_at: Time?,
  )
end

I added Opentelemetry tracing for the slow endpoint and looks like this:

post "/move" do |env|
  OpenTelemetry.trace "move-trace" do |span|
    span["debug-attr"] = "move-trace"
    context = BattleSnake::Context.from_json(env.params.json.to_json)

    OpenTelemetry.trace "context persist" do |span|
      span["debug-attr"] = "persist"
      Turn.create(
          game_id: context.game.id,
          snake_id: context.you.id,
          context: env.params.json.to_json,
          path: env.request.path,
          dead: false
        )
    end

    OpenTelemetry.trace "context persist" do |span|
      span["debug-attr"] = "move-logic"

      res = CUSTOM_LOGIC_CALL
      res.to_json
    end
  end
end

The records are being persisted correctly and I'm seeing this DEBUG log from the database

2023-02-24T18:50:33.310554Z DEBUG - db: 747.958 μs INSERT INTO "turns"("game_id", "snake_id", "context", "path", "dead", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7)  RETURNING "id" | ["86494598-a6ee-4317-933d-3fb81e2f4cbc", "5d58224d-cc49-4ba9-b65f-cfec2edb32f4", "small", "/end", false, 2023-02-24 18:50:33.309634000 UTC, 2023-02-24 18:50:33.309634000 UTC]
2023-02-24T18:50:33.310561Z DEBUG - db: 0.458 μs COMMIT

This tells me the DB roundtrip is quick (within < 1ms) but response times are coming back in around 1,000 ms. With tracing I can see these results:

Screen Shot 2023-02-24 at 12 53 03

A simple endpoint with a straightforward query like Turn.all.to_a is much quicker (~6ms responses with ~2.5 ms DB log times). So based on all of this my assumption is that there is some overhead in the model that's being created. I'd love to help further identify this, but I'd need help understanding how can I go about this (open for pairing or running more/different tests if needed).

crimson-knight commented 1 year ago

I find the debug logger gives a considerable performance hit, first try turning that down to a warning level or error level and re-run.

The biggest performance hits in cases like this are usually when generating and serving data to a console. The output is handy, but comes at a performance cost similar to what you're seeing.

I just did a test with Jennifer and Kemal where I made 2,000,000 db entries, exported to a file and saved to disk and that took under 36 seconds on my M1 Pro.

fdocr commented 1 year ago

Confirmed the log output wasn't causing the issue. Some more of my tracing also confirms that when separated (Turn.create vs .build and then .save) it's the .save and not .build that's taking up a long time.

I realized this is a connection pool size issue. I found this comment on Jennifer::Adapter::Base that could be what I'm experiencing. The tests I'm running are quick requests coming in by a CLI tool and there's commonly 2 in parallel at a time. Adding conf.pool_size = (ENV["DB_POOL"] ||= "5").to_i to the database configuration solved my problem.

Since I also had a granite implementation here's the performance comparison between them on a similar test for both of them.

Jennifer

Screen Shot 2023-02-25 at 04 56 40

Granite

Screen Shot 2023-02-25 at 04 59 50

Turns out Jennifer performs better than Granite 😄

I wonder if it's worth changing the default connection pool size from 1 to something like ActiveRecord's default of 5 (I believe that's right but might be something different). Regardless of this, I guess it's important to always configure it via ENV var for web server configurability in prod environments anyways.

crimson-knight commented 1 year ago

I'll go ahead and make some updates to the docs to clarify the connection pool. I've also run into some challenges with the config initializer and ensuring it's using the test db when running crystal spec so I think I'll add in the niceties I've come up with for that

crimson-knight commented 1 year ago

@fdocr I made some changes to the docs to help others avoid the performance bottleneck you experienced. Check out #430

fdocr commented 1 year ago

That's awesome. Thank you @crimson-knight

imdrasil commented 1 year ago

Ok, it sounds that technically you don't have performance issue anymore but the docs should be updated to make it clear you need to configure connection pool size (@crimson-knight thank you for this). Initially connection pool size was changed to 1 to make it easier to debug potential people issues and make it more reliable (previously I had some issues with concurrent requests). Maybe the time has come and it worth increasing it to something bigger (like 5). What do you think @crimson-knight @cyangle @fdocr

crimson-knight commented 1 year ago

Let's change the default pool size to 5, that makes sense to me.

Let's also add a note that if they need to debug potential concurrent request problems to adjust the pool size.