rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

Solid Queue Update to 1.0.0 #377

Open rctneil opened 1 month ago

rctneil commented 1 month ago

Hi,

In an existing app i'm updating to version 1.0.0 and it's sooo confusing. I notice that you now recommend a separate database for queue tables. This is fine but the documentation for updating doesn't make any sense.

It doesn't explain whether I need a separate data in development. It doesn't explain how to get that etc.

I've set my DB config to be:

default: &default adapter: postgresql encoding: unicode pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %> <% if ENV["DB_HOST"] %> host: <%= ENV["DB_HOST"] %> username: postgres password: postgres <% end %>

development: <<: *default database: family_media_development

test: <<: *default database: family_media_test

production: primary: &primary_production <<: default url: <%= ENV["DATABASE_URL"] %> queue: <<: primary_production url: <%= ENV["QUEUE_DATABASE_URL"] %> migrations_paths: db/queue_migrate

I've created a queue DB as configured above but when I run db:prepare it does nothing. I've connected to the queue DB in TablePlus and it's empty.

The documentation makes absolutely no mention of what migrations_paths: db/queue_migrate does. I don't have anything that exists in my app as db/queue_migrate.

Do I have to make a separate DB in development too? I'd like both to work in the same way.

I'm very confused. The app i've updated is now failing to deploy with the error:

-----> Running Rails db:migrate -----> /home/deploy/.asdf/bin/asdf exec bundle exec rake db:migrate rake aborted! StandardError: An error has occurred, this and all later migrations canceled: (StandardError)

PG::UndefinedTable: ERROR: relation "solid_queue_processes" does not exist LINE 10: WHERE a.attrelid = '"solid_queue_processes"'::regclass ^

Sorry for the two issues created today. These are both separate apps and solid_queue is causing issues across both of them in different ways!

rosa commented 1 month ago

From what version are you updating?

This is fine but the documentation for updating doesn't make any sense.

What documentation are you referring to? The one under upgrading? What part doesn't make any sense? Could you be more specific?

The reason the docs don't mention development is that we generally don't recommend running solid queue in development because Rails already provides a perfect adapter for that, the async adapter. If you want to run it in development, you just need to configure it like production. In your DB configuration, there's no separate DB for Solid Queue in development. Just follow the same instructions here to add the queue DB to your DB config in dev.

rctneil commented 1 month ago

I was updating from 0.3.0, I went straight to 1.0.0, then saw that I had to go to 0.6.0 first so I went back, did that and then back to 1.0.0.

Things are all a bit confused. right now.

Please explain this line: migrations_paths: db/queue_migrate

I'll remove solid queue from development if that's not typically the correct solution. I didn't realise this. It may be good to put these small bits of knowledge that us less knowledgeable about background job processing may not realise that's plainly obvious to you.

Any idea how to fix the production issue I am hitting?

rctneil commented 1 month ago

@rosa Any suggestions for this?

On deploy I just keep getting:

-----> Running Rails db:migrate -----> /home/deploy/.asdf/bin/asdf exec bundle exec rake db:migrate rake aborted! StandardError: An error has occurred, this and all later migrations canceled: (StandardError)

PG::UndefinedTable: ERROR: relation "solid_queue_processes" does not exist LINE 10: WHERE a.attrelid = '"solid_queue_processes"'::regclass ^

rosa commented 1 month ago

@rctneil did you run db:prepare in production?

rctneil commented 1 month ago

@rosa Yes, that's where the error was caused.

rosa commented 1 month ago

🤔 Hmmm... but you're running bundle exec rake db:migrate above. Have you run db:prepare?

rctneil commented 1 month ago

I deployed, then got a DB error which prompted me to go and run db:prepare, then came back and deployed again.

Yes, I did run db:prepare.

rctneil commented 1 month ago

@rosa I've been thinking about this and this is just a guess about what may be happening:

I deploy using Hatchbox.io. Hatchbox runs the migrations as part of this. If I deploy, it tries to run them and fails. the deploy is rolled back, I try to run db:prepare but there's nothing to prepare as the deploy failed.

Does this sound feasible?

How can I get around this?

k0va1 commented 1 month ago

Hi! I got the same error but using solid queue in development env. Looks like db:migrate doesn't actually migrate

rosa commented 1 month ago

How can I get around this?

Could you just do the same like for any other migration change that needs two step? You deploy first the DB changes (in this case the schema), and the database config changes, but without connecting to this new DB yet, then db:prepare, and then deploy the change where you tell Solid Queue to connect to the new DB.

Michael-Tix commented 1 month ago

I had a similar issue. Didn't have enough bandwidth to track down the problem but here is the migration that causes the deployment to fail

2024-10-17 16:22:31] I, [2024-10-17T17:22:30.665094 #13]  INFO -- : Migrating to MakeNameNotNull (20241017154832)
[2024-10-17 16:22:31] == 20241017154832 MakeNameNotNull: migrating ==================================
[2024-10-17 16:22:31] D, [2024-10-17T17:22:30.706504 #13] DEBUG -- :   TRANSACTION (1.8ms)  BEGIN
[2024-10-17 16:22:31] D, [2024-10-17T17:22:30.745620 #13] DEBUG -- :   SolidQueue::Process Load (2.1ms)  SELECT "solid_queue_processes".* FROM "solid_queue_processes" WHERE "solid_queue_processes"."name" IS NULL ORDER BY "solid_queue_processes"."id" ASC LIMIT $1  [["LIMIT", 1000]]
[2024-10-17 16:22:31] -- change_column(:solid_queue_processes, :name, :string, {:null=>false})
[2024-10-17 16:22:31] D, [2024-10-17T17:22:30.751894 #13] DEBUG -- :    (3.5ms)  ALTER TABLE "solid_queue_processes" ALTER COLUMN "name" TYPE character varying, ALTER COLUMN "name" SET NOT NULL
[2024-10-17 16:22:31] D, [2024-10-17T17:22:30.755876 #13] DEBUG -- :   TRANSACTION (2.7ms)  ROLLBACK
[2024-10-17 16:22:31] D, [2024

Now I don't know why this change_column operation failed but when I tried this migration on my personal machine it was successful. The only difference between my personal machine and the production server where this failed is that I use Postgres 14 locally but we use Postgres 16 in production. The root cause exception seems to be ActiveRecord::StatementInvalid: PG::InFailedSqlTransaction

What I've done to get around this is to just replace the change_column statement with raw sql that does the same thing.

ApplicationRecord.connection.execute("ALTER TABLE solid_queue_processes ALTER COLUMN name SET NOT NULL")

@rosa I wonder if this information is enough to help figure out what is going on

rosa commented 1 month ago

Hey @SageOfTixPaths, thanks for writing this up; it's very strange indeed 😕 Did you copy the whole log? I can't see the error raised there. Did it happen after the transaction was rolled back? My only guess there would be that you have processes with a NULL name so you couldn't actually make the column NOT NULL, but then these processes were gone when you ran the raw SQL, by chance? I'm not sure.

In any case, this seems different from the original issue, where the migration couldn't be run because the table didn't exist in the first place.

Michael-Tix commented 1 month ago

Did you copy the whole log?

Unfortunately, I couldn't retrieve the full logs but the transaction rollback happened because of a ActiveRecord::StatementInvalid exception raise when we try to change the column to NOT NULL. I ran this migration multiple times to confirm that it wasn't a one time thing and I even ran a query to check if any SolidQueue::Process#name was null, that query came back empty. It does look Postgres had issue with the generated query and not the state of the table.

In any case, this seems different from the original issue, where the migration couldn't be run because the table didn't exist in the first place.

Yes you are right. I just read through the thread again and realised it is most likely unrelated. I think I saw something about the migrations being rolled back and I assumed it was a similar issue

schappim commented 1 month ago

More data points for debugging...

I am in a similar boat to the folks above. I'm using Rails Jumpstart Pro by @excid3 and I've done a:

When I try to run I get the error:

11:58:37 worker | /Users/admin/.rvm/gems/ruby-3.3.5@search_littlebird/gems/activerecord-8.0.0.beta1/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec': PG::UndefinedTable: ERROR:  relation "solid_queue_processes" does not exist (ActiveRecord::StatementInvalid)
11:58:37 worker | LINE 10:  WHERE a.attrelid = '"solid_queue_processes"'::regclass
11:58:37 worker |                              ^
11:58:37 worker |
....
11:58:37 worker | /Users/admin/.rvm/gems/ruby-3.3.5@search_littlebird/gems/activerecord-8.0.0.beta1/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec': ERROR:  relation "solid_queue_processes" does not exist (PG::UndefinedTable)
11:58:37 worker | LINE 10:  WHERE a.attrelid = '"solid_queue_processes"'::regclass
admin@studio search_littlebird % bundle exec rails db:drop;
Dropped database 'search_littlebird_development'
Dropped database 'search_littlebird_development_cache'
Dropped database 'search_littlebird_development_queue'
Dropped database 'search_littlebird_development_cable'
Dropped database 'search_littlebird_test'
admin@studio search_littlebird % bundle exec rails db:prepare;
Created database 'search_littlebird_development'
Created database 'search_littlebird_development_cache'
Created database 'search_littlebird_development_queue'
Created database 'search_littlebird_development_cable'
Created database 'search_littlebird_test'
admin@studio search_littlebird % bundle exec rails db:create;
Database 'search_littlebird_development' already exists
Database 'search_littlebird_development_cache' already exists
Database 'search_littlebird_development_queue' already exists
Database 'search_littlebird_development_cable' already exists
Database 'search_littlebird_test' already exists
admin@studio search_littlebird % bundle exec rails db:migrate;

I am running on:

admin@studio search_littlebird % bundle exec rails -v
Rails 8.0.0.beta1

admin@studio search_littlebird % gem list | grep -i solid
solid_cable (3.0.2)
solid_cache (1.0.6)
solid_queue (1.0.0)

My database.yml (from Jumpstart) is as follows:

# PostgreSQL. Versions 9.1 and up are supported.
#
# Install the pg driver:
#   gem install pg
# On OS X with Homebrew:
#   gem install pg -- --with-pg-config=/usr/local/bin/pg_config
# On Windows:
#   gem install pg
#       Choose the win32 build.
#       Install PostgreSQL and put its /bin directory on your path.
#
# Configure Using Gemfile
# gem 'pg'
#
default: &default
  adapter: postgresql
  encoding: unicode
  # For details on connection pooling, see Rails configuration guide
  # http://guides.rubyonrails.org/configuring.html#database-pooling
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  <% if ENV["DB_HOST"] %>
  host: <%= ENV["DB_HOST"] %>
  username: postgres
  password: postgres
  <% end %>

development:
  primary:
    <<: *default
    database: search_littlebird_development

    # The specified database role being used to connect to postgres.
    # To create additional roles in postgres see `$ createuser --help`.
    # When left blank, postgres will use the default role. This is
    # the same name as the operating system user that initialized the database.
    #username: jumpstart

    # The password associated with the postgres role (username).
    #password:

    # Connect on a TCP socket. Omitted by default since the client uses a
    # domain socket that doesn't need configuration. Windows does not have
    # domain sockets, so uncomment these lines.
    #host: localhost

    # The TCP port the server listens on. Defaults to 5432.
    # If your server runs on a different port number, change accordingly.
    #port: 5432

    # Schema search path. The server defaults to $user,public
    #schema_search_path: myapp,sharedapp,public

    # Minimum log levels, in increasing order:
    #   debug5, debug4, debug3, debug2, debug1,
    #   log, notice, warning, error, fatal, and panic
    # Defaults to warning.
    #min_messages: notice

  cache:
    <<: *default
    database: search_littlebird_development_cache
    migrations_paths: db/cache_migrate
  queue:
    <<: *default`
    database: search_littlebird_development_queue
    migrations_paths: db/queue_migrate
  cable:
    <<: *default
    database: search_littlebird_development_cable
    migrations_paths: db/cable_migrate

# Warning: The database defined as "test" will be erased and
# re-generated from your development database when you run "rake".
# Do not set this db to the same as development or production.
test:
  <<: *default
  database: search_littlebird_test

# As with config/credentials.yml, you never want to store sensitive information,
# like your database password, in your source code. If your source code is
# ever seen by anyone, they now have access to your database.
#
# Instead, provide the password or a full connection URL as an environment
# variable when you boot the app. For example:
#
#   DATABASE_URL="postgres://myuser:mypass@localhost/somedatabase"
#
# If the connection URL is provided in the special DATABASE_URL environment
# variable, Rails will automatically merge its configuration values on top of
# the values provided in this file. Alternatively, you can specify a connection
# URL environment variable explicitly:
#
#   production:
#     url: <%= ENV["MY_APP_DATABASE_URL"] %>
#
# Read https://guides.rubyonrails.org/configuring.html#configuring-a-database
# for a full overview on how database connection configuration can be specified.
#
production:
  primary: &primary_production
    <<: *default
    database: search_littlebird_production
    username: jumpstart
    password: <%= ENV["POSTGRES_PASSWORD"] %>

  cache:
    <<: *primary_production
    database: search_littlebird_production_cache
    migrations_paths: db/cache_migrate
    # url: <%= ENV["CACHE_DATABASE_URL"] %>
  queue:
    <<: *primary_production
    database: search_littlebird_production_queue
    migrations_paths: db/queue_migrate
    # url: <%= ENV["QUEUE_DATABASE_URL"] %>
  cable:
    <<: *primary_production
    database: search_littlebird_production_cable
    migrations_paths: db/cable_migrate
    # url: <%= ENV["CABLE_DATABASE_URL"] %>

The solid queue db looks pretty empty:

image

I note that: queue_schema.rb is pretty empty:

# This file is auto-generated from the current state of the database. Instead
# of editing this file, please use the migrations feature of Active Record to
# incrementally modify your database, and then regenerate this schema definition.
#
# This file is the source Rails uses to define your schema when running `bin/rails
# db:schema:load`. When creating a new database, `bin/rails db:schema:load` tends to
# be faster and is potentially less error prone than running all of your
# migrations from scratch. Old migrations may fail to apply correctly if those
# migrations use external dependencies or application code.
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema[8.0].define(version: 0) do
  # These are extensions that must be enabled in order to support this database
  enable_extension "pg_catalog.plpgsql"

end

Running bundle exec rails solid_queue:install populates the file w/:

ActiveRecord::Schema[7.1].define(version: 1) do
  create_table "solid_queue_blocked_executions", force: :cascade do |t|
    t.bigint "job_id", null: false
    t.string "queue_name", null: false
    t.integer "priority", default: 0, null: false
    t.string "concurrency_key", null: false
    t.datetime "expires_at", null: false
    t.datetime "created_at", null: false
    t.index [ "concurrency_key", "priority", "job_id" ], name: "index_solid_queue_blocked_executions_for_release"
    t.index [ "expires_at", "concurrency_key" ], name: "index_solid_queue_blocked_executions_for_maintenance"
    t.index [ "job_id" ], name: "index_solid_queue_blocked_executions_on_job_id", unique: true
  end

  create_table "solid_queue_claimed_executions", force: :cascade do |t|
    t.bigint "job_id", null: false
    t.bigint "process_id"
    t.datetime "created_at", null: false
    t.index [ "job_id" ], name: "index_solid_queue_claimed_executions_on_job_id", unique: true
    t.index [ "process_id", "job_id" ], name: "index_solid_queue_claimed_executions_on_process_id_and_job_id"
  end

  create_table "solid_queue_failed_executions", force: :cascade do |t|
    t.bigint "job_id", null: false
    t.text "error"
    t.datetime "created_at", null: false
    t.index [ "job_id" ], name: "index_solid_queue_failed_executions_on_job_id", unique: true
  end

  create_table "solid_queue_jobs", force: :cascade do |t|
    t.string "queue_name", null: false
    t.string "class_name", null: false
    t.text "arguments"
    t.integer "priority", default: 0, null: false
    t.string "active_job_id"
    t.datetime "scheduled_at"
    t.datetime "finished_at"
    t.string "concurrency_key"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index [ "active_job_id" ], name: "index_solid_queue_jobs_on_active_job_id"
    t.index [ "class_name" ], name: "index_solid_queue_jobs_on_class_name"
    t.index [ "finished_at" ], name: "index_solid_queue_jobs_on_finished_at"
    t.index [ "queue_name", "finished_at" ], name: "index_solid_queue_jobs_for_filtering"
    t.index [ "scheduled_at", "finished_at" ], name: "index_solid_queue_jobs_for_alerting"
  end

  create_table "solid_queue_pauses", force: :cascade do |t|
    t.string "queue_name", null: false
    t.datetime "created_at", null: false
    t.index [ "queue_name" ], name: "index_solid_queue_pauses_on_queue_name", unique: true
  end
... etc etc

...however running bundle exec rails db:migrate; nukes the the file back to:

# This file is auto-generated from the current state of the database. Instead
# of editing this file, please use the migrations feature of Active Record to
# incrementally modify your database, and then regenerate this schema definition.
#
# This file is the source Rails uses to define your schema when running `bin/rails
# db:schema:load`. When creating a new database, `bin/rails db:schema:load` tends to
# be faster and is potentially less error prone than running all of your
# migrations from scratch. Old migrations may fail to apply correctly if those
# migrations use external dependencies or application code.
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema[8.0].define(version: 0) do
  # These are extensions that must be enabled in order to support this database
  enable_extension "pg_catalog.plpgsql"

end
rosa commented 1 month ago

@schappim, you're running into this Rails bug: https://github.com/rails/rails/issues/52829. You need to restore the schema and load the DB with db:prepare instead of running db:migrate, which triggers the Rails bug.

excid3 commented 1 month ago

I tested the other day with this PR and it seems to fix it: https://github.com/rails/rails/pull/53320

rosa commented 1 month ago

Oh nice! I hadn't seen that PR. Thanks a lot for sharing @excid3! 🙏

danielpuglisi commented 6 days ago

I just migrated from 0.3 to 1.0 and ran in the same issues.

The new migrations weren't added with solid_queue:install so downgraded to 0.6 and ran the bin/rails solid_queue:install:migrations task to generate them.

Deployment failed due to what @Michael-Tix reported above.

Using ApplicationRecord.connection.execute("ALTER TABLE solid_queue_processes ALTER COLUMN name SET NOT NULL") instead of change_column fixed that as well.