mosajjal / dnsmonster

Passive DNS Capture and Monitoring Toolkit
https://dnsmonster.dev
GNU General Public License v3.0
314 stars 53 forks source link

Clickhouse schema weirdness #63

Open mzealey opened 1 year ago

mzealey commented 1 year ago

I've noticed that on a number of the CH tables you are including timestamp in the order by. Rather than doing this you should probably have a truncated timestamp such as by minute (or at least truncated to per-second) otherwise there's not much point in the MV's compared to just sampling from the raw table itself.

Additionally there are a number of times when you sum() a value such as DoBit which is a UInt8 in the primary table. It would be better to cast those to UInt64 and then sum that to avoid overflows.

mosajjal commented 1 year ago

a lot of these tables have changed over time and they're overdue for a revamp and cleanup. happy to accept a PR for these :)

RootLUG commented 1 year ago

Probably related to the issue I'm getting. I just installed dnsmonster + clickhouse in a docker container on a separate host and created all tables according to the schema in this repo.

Now when running dnsmonster with output to clickhouse enabled I'm getting:

server:~# dnsmonster --config /etc/dnsmonster/dnsmonster.ini
INFO[2023-06-03T19:45:09+01:00] Starting DNStap capture
INFO[2023-06-03T19:45:09+01:00] listening on DNStap socket unix:///tmp/dnstap.sock
INFO[2023-06-03T19:45:09+01:00] socket exists, will try to overwrite the socket
INFO[2023-06-03T19:45:09+01:00] Creating the dispatch Channel
INFO[2023-06-03T19:45:09+01:00] Creating Clickhouse Output Channel
INFO[2023-06-03T19:45:09+01:00] Creating Elastic Output Channel
INFO[2023-06-03T19:45:09+01:00] skipping skipDomains refresh since it's not provided
INFO[2023-06-03T19:45:09+01:00] skipping allowDomains refresh since it's not provided
INFO[2023-06-03T19:45:09+01:00] Creating handler #0
INFO[2023-06-03T19:45:09+01:00] Creating handler #1
INFO[2023-06-03T19:45:09+01:00] Elasticsearch returned with code 200 and version 8.6.2
INFO[0004] /tmp/dnstap.sock: accepted connection 1
WARN[2023-06-03T19:45:19+01:00] failed to convert metrics to JSON.
2023-06-03T19:45:19+01:00 metrics:
INFO[2023-06-03T19:45:19+01:00] ipv4 flushed: 0, closed: 0
INFO[2023-06-03T19:45:19+01:00] ipv6 flushed: 0, closed: 0
panic: runtime error: index out of range [0] with length 0

goroutine 11 [running]:
github.com/ClickHouse/clickhouse-go/v2/lib/column.IPv6ToBytes({0x0?, 0x10?, 0xc000122030?})
    /root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/column/ipv6.go:247 +0x2cf
github.com/ClickHouse/clickhouse-go/v2/lib/column.(*IPv6).AppendRow(0xc000040810, {0x122eee0?, 0x22e1300?})
    /root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/column/ipv6.go:215 +0x345
github.com/ClickHouse/clickhouse-go/v2/lib/proto.(*Block).Append(0xc000583d00, {0xc0002b4c60?, 0x11, 0x10?})
    /root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/proto/block.go:62 +0x1a7
github.com/ClickHouse/clickhouse-go/v2.(*batch).Append(0xc000098280, {0xc0002b4c60?, 0x0?, 0x1c90b58?})
    /root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/conn_batch.go:122 +0x48
github.com/mosajjal/dnsmonster/internal/output.clickhouseConfig.clickhouseOutputWorker({{0xc0005f51c0, 0x1, 0x1}, {0xc000510cd5, 0x7}, {0xc0001656d5, 0x20}, {0xc000510cf5, 0xa}, 0x0, ...}, ...)
    /opt/dnsmonster/internal/output/clickhouse.go:198 +0xdb7
github.com/mosajjal/dnsmonster/internal/output.clickhouseConfig.Output.func1()
    /opt/dnsmonster/internal/output/clickhouse.go:148 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /root/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
    /root/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5

Not much information in the output but my guess is the schema is problematic? When I run the SELECT DISTINCT Question example that was commented below the schema for tables it was complaining about the timestamp column not existing/defined. Is there any more up to date schema or setup I can use to deploy? I'm not that familiar with clickouse as I'm now just trying to switch from ES to clickhouse due to high load/data.