Open adcharre opened 5 months ago
I'm interested to learn more. Would this be something you'd be able to checkpoint on?
Would this be something you'd be able to checkpoint on?
@atoulme certainly, it's something I'm actively looking into at the moment it so makes sense to me get a second opinion on the best way to implement this and hopefully accepted. How best to organise?
For all components, we tend to work with folks through CONTRIBUTING.md. The question I asked you earlier is in earnest - one of the thorny issues around a component reading from a remote source is to have a checkpoint mechanism that allows you to know where you stopped. We can use the storage extension for that purpose.
I am happy to sponsor this component if you'd like to work on it.
Ahh, I understand now! Thank you for clarification and yes that is an issue I have been thinking about - how best to signal that ingest is finished. I'll look into the storage extension and get a PR up with the skeleton of the receiver.
Hi, apologies I may be hijacking this thread.
Has there been any thought around integrating the S3 receiver, with SQS and S3 Event notifications ?
Our use case is we cannot directly write to an OTEL reciever in all cases, however we can write to a S3 bucket. We would then like the object event notification to notify SQS, where we could have a OTEL collector (or set of them) "listening" and on notification fetch the uploaded file and then output it into hte OTLP backend store. We coudl then also retain the source data in S3 and leverage the current features of this reciever to replay data if required.
An example sender may look something like:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "http://*"
- "https://*"
exporters:
awss3:
s3uploader:
region: us-west-2
s3_bucket: "tempo-traces-bucket"
s3_prefix: 'metric'
s3_partition: 'minute'
processors:
batch:
send_batch_size: 10000
timeout: 30s
resource:
attributes:
- key: service.instance.id
from_attribute: k8s.pod.uid
action: insert
memory_limiter:
check_interval: 5s
limit_mib: 200
service:
pipelines:
traces:
processors: [memory_limiter, resource, batch]
exporters: [awss3, spanmetrics]
The reciever could poss look something like:
receivers:
awss3:
sqs:
queue_url: "https://sqs.us-west-1.amazonaws.com/<account_id>/queue"
exporters:
otlp:
endpoint: 'http://otlp-endpoint:4317'
processors:
batch:
send_batch_size: 10000
timeout: 30s
memory_limiter:
check_interval: 5s
limit_mib: 200
service:
pipelines:
traces:
processors: [memory_limiter, batch]
exporters: [otlp, spanmetrics]
Thoughts ?
S3 Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html
It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.
flowchart LR
subgraph env1
app1 --> env1-collector
app2 --> env1-collector
end
env1-collector --> S3[(S3)]
subgraph env2
app3 --> env2-collector
app4 --> env2-collector
end
env2-collector --> S3
subgraph shared-env
S3 --> shared-collector
end
It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.
flowchart LR subgraph env1 app1 --> env1-collector app2 --> env1-collector end env1-collector --> S3[(S3)] subgraph env2 app3 --> env2-collector app4 --> env2-collector end env2-collector --> S3 subgraph shared-env S3 --> shared-collector end
Fully agree,
we also have scenarios where a receiver should constantly process new uploads (from S3Exporter) on an S3 bucket. Means without specifying starttime and endtime but having a checkpoint where it last stopped reading.
@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.
@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.
If the receiver would be expanded at some point to constantly process new uploads made by the S3Exporter, could it be used to buffer data independently of a file system? The idea would be to have an alternative to https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#persistent-queue.
The idea is to have a resilient setup of exporters + importers (with s3 in between as buffer) which run stateless, as they would not require any filesystem to buffer data to disks.
Do you think a setup like this would make sense?
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
The purpose and use-cases of the new component
The S3 receiver will allow the retrieval and processing of telemetry data previously stored in S3 by the AWS S3 Exporter. This will make it possible to retrieve data previously cold stored in S3 and allow us to investigate issues not reported within the time span data is available in our Observability service provider.
Example configuration for the component
Telemetry data types supported
Is this a vendor-specific component?
Code Owner(s)
adcharre
Sponsor (optional)
@atoulme
Additional context
No response