Closed emreyalvac closed 1 year ago
Can you explain a bit more the use case? Is there a standard data format in which the traces will be stored?
Hi @atoulme,
My thought is that write speed is very important for Open Telemetry.
Cassandra is defined as an open-source NoSQL data storage system that leverages a distributed architecture to enable high availability, scalability, and reliability, managed by the Apache non-profit organization.
Cassandra, so fast for write operations and very compatible for analytics data. Also, it's support storing time series data thats why you can calculate throughput, response time and apdex etc.. (time series)
Cassandra’s three data modeling ‘dogmas’:
Disk space is cheap.
Writes are cheap.
Network communication is expensive.
Example Span data on Cassandra database:
[
{
"traceid": "104077629213055e8523102a57c659cd",
"duration": 75957000,
"events": null,
"links": null,
"parentspanid": "",
"resourceattributes": {
"service.name": "unknown_service:dotnet"
},
"servicename": "unknown_service:dotnet",
"spanattributes": {
"http.flavor": "1.1",
"http.host": "localhost:5000",
"http.method": "GET",
"http.scheme": "http",
"http.status_code": "200",
"http.target": "/swagger/v1/swagger.json",
"http.url": "http://localhost:5000/swagger/v1/swagger.json",
"http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
},
"spanid": "c123d6dae1744ce3",
"spankind": "SPAN_KIND_SERVER",
"spanname": "/swagger/v1/swagger.json",
"statuscode": "STATUS_CODE_UNSET",
"statusmessage": "",
"timestamp": "2023-01-22",
"tracestate": ""
}
]
So is it stored as a cql table? What is the schema used?
I have found those in your impl:
const (
// language=SQL
createDatabaseSQL = `CREATE KEYSPACE %s with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };`
// language=SQL
createEventTypeSql = `CREATE TYPE IF NOT EXISTS %s.Events (Timestamp Date, Name text, Attributes map<text, text>);`
// language=SQL
createLinksTypeSql = `CREATE TYPE IF NOT EXISTS %s.Links (TraceId text, SpanId text, TraceState text, Attributes map<text, text>);`
// language=SQL
createSpanTableSQL = `CREATE TABLE IF NOT EXISTS %s.%s (TimeStamp DATE,TraceId text, SpanId text, ParentSpanId text, TraceState text, SpanName text, SpanKind text, ServiceName text, ResourceAttributes map<text, text>, SpanAttributes map<text, text>, Duration int,StatusCode text,StatusMessage text, Events frozen<Events>, Links frozen<Links>, PRIMARY KEY (TraceId));`
)
That is intriguing. I'd like to see if you have considered looking into how to work on this with a cluster (I see replication factor set to 1) and particularly if you have a partition key strategy for this.
Hi @atoulme,
Thanks for your time and review. I appreciate it.
Yes, it's storing in Cassandra tables. I improved config structure to change replication and compression dynamically. Also i changed PRIMARY KEY to SpanId. (PRIMARY KEY also defines the PARTITION KEY) Maybe we can create COMPOSE PARTITION KEY between ServiceName and SpanId.
https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html
CREATE KEYSPACE otel WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3};
In the above example, we created a keyspace called otel using SimpleStrategy with replication factor 3. The data inserted in this keyspace will be replicated to the three nodes, in one datacenter and across different racks.
When i run Cassandra exporter with following config, schema will be like this:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
cassandra:
dsn: 127.0.0.1
keyspace: "otel"
trace_table: "otel_spans"
replication:
class: "SimpleStrategy"
replication_factor: 1
compression:
algorithm: "ZstdCompressor"
service:
pipelines:
traces:
receivers: [ otlp ]
exporters: [ cassandra ]
Schema:
otel: schema durable_writes: true replication: {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
+ object-types
events: object-type
+ object-attributes
timestamp: date
name: text
attributes: map<text, text>
links: object-type
+ object-attributes
traceid: text
spanid: text
tracestate: text
attributes: map<text, text>
+ tables
otel_spans: table compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor'}
+ columns
traceid: text
duration: int
events: frozen<events>
links: frozen<links>
parentspanid: text
resourceattributes: map<text, text>
servicename: text
spanattributes: map<text, text>
spanid: text
spankind: text
spanname: text
statuscode: text
statusmessage: text
timestamp: date
tracestate: text
+ keys
primary key: (spanid)
Default config:
{
DSN: "127.0.0.1",
Keyspace: "otel",
TraceTable: "otel_spans",
Replication: Replication{
Class: "SimpleStrategy",
ReplicationFactor: 1,
},
Compression: Compression{
Algorithm: "LZ4Compressor",
},
}
That’s great! Please look for a sponsor to land this. I cannot sponsor fwiw. Come to a SIG meeting if possible to present your work.
@atoulme now that you can, would you be interested in sponsoring this component?
Now also supports Logs.
I will sponsor.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
The purpose and use-cases of the new component
The purpose of this exporter is to extract traces and logs to Cassandra database.
I already started to develop this component here: https://github.com/emreyalvac/opentelemetry-collector-contrib/tree/cassandra-exporter-implementation.
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/18515
Example configuration for the component
Telemetry data types supported
traces, logs
Is this a vendor-specific component?
Sponsor (optional)
No response
Additional context
No response