timescale / promscale

[DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.
https://www.timescale.com/promscale
Apache License 2.0
1.33k stars 168 forks source link

Prom-migrator process terminate without notice after 2 hours. #638

Closed Nathapat-Boss closed 3 years ago

Nathapat-Boss commented 3 years ago

Hi Promscale team,

Now I'm using prom-migrator to migrate data from TimeScaleDB 1.7 to TimeScaleDB 2.2.1 as following this diagram

Screen Shot 2564-05-25 at 13 55 49

I have run prom-migrator to migrate data with 7 day time ranges. At first it seem running fine but after 2 hours I found the prom-migrator process is terminate without notice ( the log stop generating process ). I have no idea what cause this please help investigate

Here the parameters

prom-migrator -mint= 1619802000\ -maxt= 1620406800\ read-url=http://pg_prometheus_adapter_url:9201/read \ -write-url=http://promscale_url:9201/write \ -progress-metric-url=http://promscale_url:9201/read \ -concurrent-pulls=4 \ -concurrent-push=16 \ -max-read-size=2GB >> prom-migrator-$timestamp 2>> prom-activity.log

Activity logs:

prom-activity.log prom-migrator-2021-05-25_10:16:09.log prom-migrator-2021-05-24_22:15:39.log prom-migrator-2021-05-24_19:48:20.log

CPU& Mem usage:

TimeScaleDB 1.7

Screen Shot 2564-05-25 at 13 45 36 Screen Shot 2564-05-25 at 12 57 58

TimScaleDB 2.2.1

Screen Shot 2564-05-25 at 13 30 20 Screen Shot 2564-05-25 at 13 01 44

Request:

Please create parameter to change timeout I'm notice that the default is 5 minutes. Please add retry process when the timeout is occurred.

If you can provide admin dashboard for Promscale or prom-migrator to watch the migration process that would be grate.

Thanks in advance,

Nathapat Kherlao ( Boss)

cevian commented 3 years ago

@Harkishen-Singh please take a look

Nathapat-Boss commented 3 years ago

I have found the root cause for 2 hours terminate. Because I ran the prom-migrator by ssh to remote storage then executed it as front process and the session for ssh will terminate within 2 hours if no interactive.

Fix by running nohup prom-migrator & so to run it as background process.

After it bypass 2 hours barrier , I found another error cause by timeout.

level=error ts=2021-05-25T23:34:59.683Z caller=main.go:143 msg="running writer: remote-write run: Post \"http://10.87.23.xxx:9201/write\": context deadline exceeded"

level=error ts=2021-05-26T05:43:37.984Z caller=main.go:137 msg="running reader: remote-run run: executing client-read: read-channels: error sending request: Post \"http://10.87.23.xxx:9201/read\": context deadline exceeded"

Harkishen-Singh commented 3 years ago

Hey @Nathapat-Boss , this is a timeout issue I guess and PR #646 will add the ability for custom timeouts and retrying behaviour.

BTW, if I recall, the migration had succeeded in our conversations on the channel. If so, you can consider closing this issue unless there is some other problem.

Nathapat-Boss commented 3 years ago

I have resolve this case it because I'm running the migrator process as foreground by ssh to remote server. By default the ssh session will terminate in 2 hours so the migrate process will be close along with it. Thanks you for your support and looking forward to use new version of migrator. @Harkishen-Singh