migrate_to_hypertable() silently hangs

chris-morris-h2o commented 1 year ago

Timescale Version: 2.7.2 Postgres Version: 14.5

Issue:

I have a regular table that is ~170 gigs in size. I followed the directions in migrate.md:

CREATE TABLE sqlth_1_data_new (LIKE sqlth_1_data INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);

select create_hypertable('sqlth_1_data_new', 't_stamp', chunk_time_interval => 86400000)

I then called

CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new');

Whenever I check the migration log it tends to stop at around ~9records not migrated and ~150,000 records migrated. This is way too small for the 170 gig table. If I stop the call and start it again, it finds the additional records (over 1 million) and begins to migrate them. I also attempted with the multithreading options:

CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '4 hours'::interval,'tagid',4,1); CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '4 hours'::interval,'tagid',4,2); CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '4 hours'::interval,'tagid',4,3); CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '4 hours'::interval,'tagid',4,4);

This would migrate the same amount and hang again around 39 records left. If I stopped it and restarted it it would not find additional records unless I changed the interval. I have tried different intervals; 1, 4, 12, and 24 hour intervals. They don't stop at the same unmigrated records count. The multithreading option never seems to find additional records unlike the single threaded option.

If I call the single thread option, let it hang, stop it, call it a second time, let it run for a bit to find the unmigrated records, then stop it, and finally start the multithreaded option then the multithreaded option will start migrating the unmigrated records that the single thread initially found. However, at some point in the process (and it is different every time) the multithreaded stops working again and interrupting it and restarting it does not cause it to continue. If I call the singlethread option it will begin working again, but killing it and switching back to the multithreaded option does not cause the migration to continue.

I checked pg_stat_activity to see if I could see anything but don't really see anything that stands out. I'm not sure what the problem could be. Any ideas?

datid	datname	pid	usesysid	usename	application_name	client_addr	client_hostname	client_port	backend_start	xact_start	query_start	state_change	wait_event_type	wait_event	state	backend_xid	backend_xmin	query_id	query	backend_type
2.82E+08	eftestrestore	54991	10	postgres	TimescaleDB Background Worker Scheduler	2022-09-13 15:01:06.245292+00	2022-09-15 13:23:42.446979+00	Extension	Extension	idle					TimescaleDB Background Worker Scheduler
2.82E+08	eftestrestore	361230	10	postgres	psql			-1	2022-09-15 12:14:24.759636+00	2022-09-15 13:22:31.299894+00	2022-09-15 13:22:31.29086+00	2022-09-15 13:22:31.290863+00	active	18700543	18700543		CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '12 hours'::interval,'tagid',4,1);	client backend
2.82E+08	eftestrestore	361496	16388	wbsadmin	psql			-1	2022-09-15 12:15:56.418579+00	2022-09-15 13:22:32.458727+00	2022-09-15 13:22:32.450007+00	2022-09-15 13:22:32.450008+00	Lock	relation	active	18700547	18700543		CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '12 hours'::interval,'tagid',4,2);	client backend
2.82E+08	eftestrestore	370071	16388	wbsadmin	psql			-1	2022-09-15 13:20:10.451209+00	2022-09-15 13:22:33.644771+00	2022-09-15 13:22:33.634036+00	2022-09-15 13:22:33.63404+00	Lock	relation	active	18700550	18700543		CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '12 hours'::interval,'tagid',4,3);	client backend
2.82E+08	eftestrestore	361688	16388	wbsadmin	psql			-1	2022-09-15 12:16:33.271994+00	2022-09-15 13:22:34.754784+00	2022-09-15 13:22:34.744147+00	2022-09-15 13:22:34.744151+00	Lock	relation	active	18700553	18700543		CALL migrate_to_hypertable('sqlth_1_data','sqlth_1_data_new', '12 hours'::interval,'tagid',4,4);	client backend
2.82E+08	eftestrestore	361921	3122465	ignhist	pgAdmin 4 - DB:eftestrestore	172.30.30.2	51223	2022-09-15 12:18:41.512397+00	2022-09-15 13:06:19.755473+00	2022-09-15 13:06:19.756157+00	Client	ClientRead	idle				SELECT rel.oid, rel.relname AS name, (SELECT count() FROM pg_catalog.pg_trigger WHERE tgrelid=rel.oid AND tgisinternal = FALSE) AS triggercount, (SELECT count() FROM pg_catalog.pg_trigger WHERE tgrelid=rel.oid AND tgisinternal = FALSE AND tgenabled = 'O') AS has_enable_triggers, (CASE WHEN rel.relkind = 'p' THEN true ELSE false END) AS is_partitioned, (SELECT count(1) FROM pg_catalog.pg_inherits WHERE inhrelid=rel.oid LIMIT 1) as is_inherits, (SELECT count(1) FROM pg_catalog.pg_inherits WHERE inhparent=rel.oid LIMIT 1) as is_inherited FROM pg_catalog.pg_class rel WHERE rel.relkind IN ('r','s','t','p') AND rel.relnamespace = 2200::oid AND NOT rel.relispartition ORDER BY rel.relname;	client backend
2.82E+08	eftestrestore	361963	3122465	ignhist	pgAdmin 4 - CONN:7595495	172.30.30.2	51312	2022-09-15 12:19:04.856964+00	2022-09-15 13:23:46.314539+00	2022-09-15 13:23:46.314539+00	2022-09-15 13:23:46.314541+00	active		18700543		select * from pg_stat_activity where datname = 'eftestrestore'	client backend

chris-morris-h2o commented 1 year ago

I changed my interval to 5 minutes and the process has been running without hanging up for quite awhile and has found 4,271,354 unmigrated records now.

Is there a way to attempt to calculate the optimal interval for processing?

haydenflinner commented 1 year ago

@chris-morris-h2o did you find an answer to this? This is something I'm wondering about for this migrate.sql and also for general hypertable chunk intervals; it would be nice there were a way to estimate at runtime what would be a good batch or chunk size, as my input data can vary dramatically in density. I'm thinking of using this migrate_to_hypertable in combination with staging tables for the process of backfilling. Rather than hammering the main hypertable with naive inserts from already sorted staging tables.

timescale / timescaledb-extras

migrate_to_hypertable() silently hangs #29