Hbase Data DownSampling

pinpoint-apm / pinpoint

APM, (Application Performance Management) tool for large-scale distributed systems.

https://pinpoint-apm.gitbook.io/

Apache License 2.0

13.34k stars 3.75k forks source link

Hbase Data DownSampling #11198

Open jaca-p opened 1 month ago

jaca-p commented 1 month ago

Hello. I would like to use pinpoints in the production environment. I would like to down-sample Hbase's data at 5m and 30m intervals. But I don't understand how to use Byte RowKey in Hbase.

For example. I thought I would distinguish the Agent through AgentInfo's 'i' qualifier, and I thought this data would distinguish between the Row of AgentStatV2.

However, each Rowkey had a different Byte inserted, making it difficult to understand the behavior.

Can you give me some insight on this?

Do I need an understanding of hbaseWD to deal with this issue?

minwoo-jung commented 1 month ago

Reading the question, it seems to be about how rawkey is organized. The rawkey can be viewed at the code level as how the DAO class below generates the rawkey when creating a PUT object. https://github.com/pinpoint-apm/pinpoint/tree/master/collector/src/main/java/com/navercorp/pinpoint/collector/dao/hbase

Additionally, it is not trivial to change core logic to down sample, If you could also share why you're down sampling, I'll see if I have any additional feedback.

jaca-p commented 1 month ago

we want to keep the data for a long time and reduce the capacity accordingly.

I thought it would be enough to implement Downsampling if I could handle Agent-specific inquiry and Timestamp through RawKey on each table.

There hasn't been enough code analysis yet, do you think it's difficult?

In addition, it was found that the Pinpoint function does not provide a function to extract data such as other databases or APIs from Hbase. Is it Right?

minwoo-jung commented 1 month ago

Unfortunately, you're right. It's a lot of work to provide a variety of data stores, so we stick to one.