Closed ruslandoga closed 1 month ago
Unfortunately, this might not work as ClickhouseRepo
is most likely not running when this migration is executed. The migration script uses both, Plausible.Repo
and Plausible.ClickhouseRepo
.
% MIX_ENV=ce_dev m ecto.migrate
11:21:09.474 [info] == Running 20240528115149 Plausible.Repo.Migrations.MigrateSiteImports.up/0 forward
Backfilling legacy site import across 0 sites (DRY RUN: false)...
Finished backfilling sites.
11:21:09.499 [debug] QUERY OK source="sites" db=0.2ms
SELECT s0."id", s0."domain", s0."timezone", s0."public", s0."locked", s0."stats_start_date", s0."native_stats_start_at", s0."allowed_event_props", s0."conversions_enabled", s0."props_enabled", s0."funnels_enabled", s0."ingest_rate_limit_scale_seconds", s0."ingest_rate_limit_threshold", s0."domain_changed_from", s0."domain_changed_at", s0."imported_data", s0."inserted_at", s0."updated_at", s0."id" FROM "sites" AS s0 WHERE (s0."id" = ANY($1)) [[2, 1]]
Adjusting end dates of 2 site imports (DRY RUN: false)...
Adjusting end date for site import 2 (1/2) (site ID 1, start date: 2020-05-31, end date: 2022-06-01)
** (RuntimeError) could not lookup Ecto repo Plausible.ClickhouseRepo because it was not started or it does not exist
(ecto 3.11.2) lib/ecto/repo/registry.ex:22: Ecto.Repo.Registry.lookup/1
(ecto 3.11.2) lib/ecto/repo/supervisor.ex:160: Ecto.Repo.Supervisor.tuplet/2
(plausible 0.0.1) lib/plausible/clickhouse_repo.ex:2: Plausible.ClickhouseRepo.all/2
(plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:154: Plausible.DataMigration.SiteImports.imported_stats_end_date/2
(plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:92: anonymous fn/4 in Plausible.DataMigration.SiteImports.adjust_site_import_end_dates/2
(elixir 1.16.0) lib/enum.ex:2528: Enum."-reduce/3-lists^foldl/2-0-"/3
(plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:80: Plausible.DataMigration.SiteImports.adjust_site_import_end_dates/2
(plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:45: Plausible.DataMigration.SiteImports.run/1
I have used https://hexdocs.pm/ecto_sql/Ecto.Migrator.html#with_repo/3 wrapper function to ensure ClickhouseRepo
is running when data migration is executed:
% MIX_ENV=ce_dev m ecto.migrate
11:28:21.138 [info] == Running 20240528115149 Plausible.Repo.Migrations.MigrateSiteImports.up/0 forward
Backfilling legacy site import across 0 sites (DRY RUN: false)...
Finished backfilling sites.
11:28:21.180 [debug] QUERY OK source="sites" db=0.2ms
SELECT s0."id", s0."domain", s0."timezone", s0."public", s0."locked", s0."stats_start_date", s0."native_stats_start_at", s0."allowed_event_props", s0."conversions_enabled", s0."props_enabled", s0."funnels_enabled", s0."ingest_rate_limit_scale_seconds", s0."ingest_rate_limit_threshold", s0."domain_changed_from", s0."domain_changed_at", s0."imported_data", s0."inserted_at", s0."updated_at", s0."id" FROM "sites" AS s0 WHERE (s0."id" = ANY($1)) [[2, 1]]
Adjusting end dates of 2 site imports (DRY RUN: false)...
Adjusting end date for site import 2 (1/2) (site ID 1, start date: 2020-05-31, end date: 2022-06-01)
End date of site import 2 (site ID 1) is left unadjusted.
Adjusting end date for site import 3 (2/2) (site ID 2, start date: 2020-05-31, end date: 2024-05-13)
End date of site import 3 (site ID 2) is left unadjusted.
Finished adjusting end dates of site imports.
Finished
11:28:21.306 [info] == Migrated 20240528115149 in 0.1s
11:28:21.326 [info] Migrations already up
Can the script switch to using Plausible.IngestRepo?
@ruslandoga Wouldn't Plausible.Repo
be missing from running apps then?
Why would it be missing?
@ruslandoga That's why it's failing here in the first place, to my understanding. Looking at https://github.com/elixir-ecto/ecto_sql/blob/v3.11.2/lib/mix/tasks/ecto.migrate.ex#L151-L158 - the migration task only starts a particular repo for the duration of executing the migrations, not the whole app (or other repos, in parallel). I think the same holds for migrator in releases.
This fails with migrator run via release because ce?()
expands Mix.env()
which is in turn used in runtime context here:
$ ./migrate.sh
Loading plausible..
Starting dependencies..
Starting repos..
Running migrations for Elixir.Plausible.Repo
** (UndefinedFunctionError) function Mix.env/0 is undefined (module Mix is not available)
Mix.env()
(plausible 0.0.1) expanding macro: Plausible.ce?/0
lib/plausible-0.0.1/priv/repo/migrations/20240528115149_migrate_site_imports.exs:6: Plausible.Repo.Migrations.MigrateSiteImports.up/0
(elixir 1.16.0) expanding macro: Kernel.if/2
nofile:1: (file)
I think it's better to add an explicit migration step to upgrade instructions instead, like:
docker compose exec plausible bin/plausible rpc "Plausible.DataMigration.SiteImports.run(dry_run: false)"
I think it's easier to work around this than to push it down to self-hosters to perform another manual step.
ce?
can be made into a function similar to Plausible.product_name/0
, while on_ce
and others would stay macros.
the migration task only starts a particular repo for the duration of executing the migrations, not the whole app (or other repos, in parallel)
We are not using mix ecto.migrate
: https://github.com/plausible/analytics/blob/master/rel/overlays/migrate.sh#L6
eval
starts the whole app with all the repos.
We are not using mix ecto.migrate:
Anyone trying to setup the application locally (for instance, with MIX_ENV=ce|ce_test mix ecto.setup
) will have this migration fail on them without both repos up.
P.S. The updated ce?()
check works when used at runtime now, confirmed with a test run on staging.
Anyone trying to setup the application locally (for instance, with MIX_ENV=ce|ce_test mix ecto.setup) will have this migration fail on them without both repos up.
MIX_ENV=ce|ce_test mix ecto.setup fails already since seeds use funnels :) And it doesn't seem like ClickhouseRepo is called if there are no sites in the DB (the setup use-case). Something feels off to me about this migrator wrapper.
eval
which brings the whole application up beforehandeval starts the whole app with all the repos.
I was wrong, only the selected repos and apps are started right now.
[
{:ecto_sql, ~c"SQL-based adapters for Ecto and database migrations", ~c"3.11.1"},
{:ch, ~c"HTTP ClickHouse driver for Elixir", ~c"0.2.5"},
{:mint, ~c"Small and composable HTTP client.", ~c"1.6.0"},
{:hpax, ~c"Implementation of the HPACK protocol (RFC 7541) for Elixir", ~c"0.2.0"},
{:castore, ~c"Up-to-date CA certificate store.", ~c"1.0.7"},
{:ecto, ~c"A toolkit for data mapping and language integrated query for Elixir", ~c"3.11.2"},
{:eex, ~c"eex", ~c"1.16.0"},
{:postgrex, ~c"PostgreSQL driver for Elixir", ~c"0.17.5"},
{:jason, ~c"A blazing fast JSON parser and generator in pure Elixir.\n", ~c"1.4.1"},
{:decimal, ~c"Arbitrary precision decimal arithmetic.", ~c"2.1.1"},
{:db_connection, ~c"Database connection behaviour for database transactions and connection pooling\n", ~c"2.6.0"},
{:telemetry, ~c"Dynamic dispatching library for metrics and instrumentations", ~c"1.2.1"},
{:ssl, ~c"Erlang/OTP SSL application", ~c"11.1"},
{:public_key, ~c"Public key infrastructure", ~c"1.15"},
{:asn1, ~c"The Erlang ASN1 compiler version 5.2.1", ~c"5.2.1"},
{:crypto, ~c"CRYPTO", ~c"5.4"},
{:logger, ~c"logger", ~c"1.16.0"},
{:elixir, ~c"elixir", ~c"1.16.0"},
{:compiler, ~c"ERTS CXC 138 10", ~c"8.4.1"},
{:stdlib, ~c"ERTS CXC 138 10", ~c"5.2"},
{:kernel, ~c"ERTS CXC 138 10", ~c"9.2"}
]
So this migration still needs a workaround to either:
use the already started repo (Plausible.IngestRepo instead of Plausible.ClickhouseRepo)
pre-start Plausible.ClickhouseRepo in prepare
(this could help automatically avoid similar problem in the future)
defp prepare do
# ...
Enum.each([Plausible.ClickhouseRepo | repos()], & &1.start_link(pool_size: 2))
end
I've opened a PR with the latter: https://github.com/plausible/analytics/pull/4155
Changes
This PR makes site_imports be automatically populated from sites.imported_data on startup for CE builds. Right now the self-hosters need to do it manually. See https://github.com/plausible/analytics/discussions/4125#discussioncomment-9580118
Tests
Changelog
Documentation
Dark mode