ytyou / ticktock

TickTockDB is an OpenTSDB-like time series database, with much better performance.
GNU General Public License v3.0
76 stars 9 forks source link

TickTock just shutting down #32

Closed Soren-klars closed 1 year ago

Soren-klars commented 1 year ago

Hi there, I'm in the middle of developing a little electricity dashboard, and while just manually running a few requests from python to TickTock the DB keeps on shutting down. There is no load on the system, enough cpu, ram, disc etc. Stored data is minimal (> 1000 entries) The log only shows entries like this: 2023-01-20 23:50:46.964 [INFO] [tcp_2_task_2] Interrupted (11), shutting down... 2023-01-20 23:50:46.965 [INFO] [tcp_listener_0] listener 0 stopped. 2023-01-20 23:50:46.965 [INFO] [tcp_listener_2] TCP listener 2 stopped. 2023-01-20 23:50:46.965 [INFO] [tcp_listener_1] TCP listener 1 stopped. 2023-01-20 23:50:47.242 [INFO] [tcp_listener_0] listener 0 stopped. 2023-01-20 23:50:47.242 [INFO] [tcp_listener_1] TCP listener 1 stopped. 2023-01-20 23:50:47.243 [INFO] [tcp_listener_2] TCP listener 2 stopped. 2023-01-20 23:50:47.979 [INFO] [main] Start shutdown process... 2023-01-20 23:50:48.643 [INFO] [timer] Timer stopped 2023-01-20 23:50:49.310 [INFO] [main] QueryExecutor::shutdown complete 2023-01-20 23:50:49.314 [INFO] [main] Tsdb::shutdown complete 2023-01-20 23:50:49.314 [INFO] [main] Shutdown process complete

I'm running it in Debian 32bit on an Orange PI PC. Can't the TickTock server restart itself or the component when an issue happens, instead of just going down? Is this kind of instability know? What would be a good way to increase availability?

Thanks in advance Soren

ticktock-log-extract.log

ylin30 commented 1 year ago

@Soren-klars Sorry to hear about the crash. We are still in beta and currently fully focusing on perf. Absolutely lots of stability works need to be done.

I haven't had a chance to run in Orange PI but I look at your log and realize that you are using a pretty old version 0.3.10. Would you mind to try the latest version 0.10.2 which already resolved lots of corner cases causing TT to crash? One thing to mention is that you will have to start from scratch by removing all the data files and append logs in 0.3.10, since 0.10.* is a major design change and uses different data format than previous versions.

TT doesn't restart itself if crash. You can add TT as a systemctl service to achieve that.

We may need to find an Orange PI to fully test TT.

Thanks for making TT better.

Soren-klars commented 1 year ago

Thanks for the quick response. I did install it from the latest sources now and let's see. I will try to re-import the old data per script via http requests. Thanks guys for all your work, it looks very promising... I'm happy to test more if you want.

ylin30 commented 1 year ago

@Soren-klars We just realized another bug in 0.10.2. It will lead to incorrect floating values, though no crash issue. Please use v2 compressor instead of the default v3 compressor. Just add this line into your config and restart from scratch.

tsdb.compressor.version = 2

Thanks.

jens-ylja commented 1 year ago

I have the same signal 11 (SIGSEGV) crash from time to time. It always happened when querying (I'm using Grafana - query is quite simple). If it happened once, it re-happens when running the same query again. I always had to throw away the whole database after such an error because it seemed not to self heal - which indeed is annoying.

I already spent some hours but unfortunately wasn't able to drill it down to a reproducible scenario yet - still working on this. My current working approach - it's related to metrics having the same name but different cardinality of the tag lists.

I'm running version 0.11.4 compiled from sources on my own. My hardware: ARM32 (ODROID HC1 with Ubuntu Linux - 5.4.227-248 #1 SMP PREEMPT Thu Dec 15 11:36:07 EST 2022 armv7l armv7l armv7l GNU/Linux)

ytyou commented 1 year ago

Next time it happens, do you mind sending us the content of the whole data directory? Plus the config file. Thanks!

jens-ylja commented 1 year ago

Yes I'm working on it. Unfortunately the last times it happened my database had about 80-100 megabytes ~5 weeks of collected data.

I hope to get it "a little" smaller :)

ylin30 commented 1 year ago

@jens-ylja Looks to me your crash issue (caused by query) is different from the crash in this thread which is caused by compact and fixed in 0.10.4-beta already. Please feel free to open a new issue once you have a repro.

Unfortunately I don't have your hardware ODRoid HC1 for a repro. I try 0.11.5 in RaspberryPI-zero-w with ARM32 and it works with Grafana. Hope you can share with us the data, config, and queries when you get a chance to repro it.

ytyou commented 1 year ago

Also, consider moving to the latest version (v0.11.6).

jens-ylja commented 1 year ago

@ylin30 - OK, I'll create a new issue.