vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.28k stars 1.61k forks source link

Officially support `ceph` object storage as possible sink #1498

Open ghost opened 4 years ago

ghost commented 4 years ago

Ceph is an open source distributed object storage.

Some users might want to use it as a sink for Vector. I think the easiest way to send data to Ceph from Vector would be to use its S3-compatible API (example) via aws_s3 sink.

I wonder does it make sense to add an example of using Ceph with the aws_s3 sink. If so, we probably would need to also add a CI test which would ensure that it actually works.

melchiormoulin commented 4 years ago

Hello, We are interested to use vector with ceph sink , seems doesn't work on our side. Do you have a configuration to share ? How can i debug ? Here is my post on the chat:

Hello someone has try AWS S3 Sink with ceph ? For me it doesn't work, for example for the healtcheck ceph return a 404 response code for the head method while it return 200 response code when i'm using mc ls, here is the config

[sinks.ceph]
  # REQUIRED - General
  type = "aws_s3" # must be: "aws_s3"
  inputs = ["syslog"] # example
  bucket = "vector" # example
  compression = "none" # example, enum
  endpoint = "http://my-ceph.com:9000"

  # OPTIONAL - Object Names
  filename_append_uuid = true # default
  filename_extension = "log" # default
  filename_time_format = "%s" # default
  key_prefix = "date=%F/" # default
  # REQUIRED - requests
  encoding = "text" # example, enum

  # OPTIONAL - General
  healthcheck = true# default 

i set also the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY . When i try to send a log it returns me


Feb 28 16:40:05.185 ERROR sink{name=ceph type=aws_s3}: vector::sinks::util: request failed. error=<?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidArgument</Code><BucketName>http://my-ceph.com:9000</BucketName><RequestId>tx00000000000000c51a948-005e594265-430c8a-myhost-1</RequestId><HostId>myhostid</HostId></Error> ```
Could you help me with that please ? :-) Have a nice day
jszwedko commented 4 years ago

Noting another request for this in discord: https://discord.com/channels/742820443487993987/746070604192415834/780897259944411157

mrmassis commented 4 years ago

my config is:

[sinks.s3]
  type = "aws_s3"
  inputs = ["json"]
  bucket = "logaas-tenant-4"
  compression = "gzip"
  endpoint = "XXXXXXX"
  healthcheck = false
jszwedko commented 4 years ago

User noted that fluentd does work with ceph.

https://discord.com/channels/742820443487993987/746070604192415834/781199457072840744

mrmassis commented 3 years ago

Is possible define bucket like a placeholder

flavico commented 3 years ago

Add possibility to use Ceph as a source too.

n0rm4l-real commented 3 years ago

Any update on this? Still getting the same error with 0.16.1:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
    <Code>InvalidArgument</Code>
    <BucketName>log-backup-test</BucketName>
    <RequestId>tx000000000000030db2fa5-006150048e-bd4064a-jpe2b</RequestId>
    <HostId>bd4064a-jpe2b-jp</HostId>
</Error>

Debug:

Sep 26 05:26:38.339 DEBUG sink{component_id=staas component_kind="sink" component_type=aws_s3 component_name=staas}:request{request_id=4}:request:http: vector::internal_events::http_client: HTTP response. status=400 Bad Request version=HTTP/1.1 headers={"content-length": "225", "x-amz-request-id": "tx000000000000030db2fa5-006150048e-bd4064a-jpe2b", "accept-ranges": "bytes", "content-type": "application/xml", "date": "Sun, 26 Sep 2021 05:26:38 GMT"} body=[225 bytes]

jszwedko commented 3 years ago

Hi @n0rm4l-real . We haven't been able to dig into this yet. Are you aware of why Ceph doesn't like the request? That should help expedite.

jothoma1 commented 3 years ago

Hi @jszwedko and all does Ceph S3 works with latest version ? we are planning to use it Thanks

jszwedko commented 3 years ago

@jothoma1 We haven't been able to dig into this yet. Would you want to throw a 👍 on the top-level description? That helps us prioritize based on demand.

jothoma1 commented 2 years ago

Hi @jszwedko sorry for the delay just done it ! i will also try and give you some informations thanks !

JustinMason commented 2 years ago

Im seeing the same errors using the aws_s3 sink against a S3 compliant API on Digital Ocean. Any suggests on how to work around it?

 ERROR sink{component_kind="sink" component_id=over_do_s3 component_type=aws_s3 component_name=over_do_s3}:request{request_id=0}: vector::sinks::util::retries: Non-retriable error; dropping the request. error=Request ID: None Body: 
 <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidArgument</Code><BucketName>do-test-bucket</BucketName><RequestId>tx0000000000000281456a3-00624381d1-2bd72f01-nyc3c</RequestId><HostId>2bd72f01-nyc3c-nyc3-zg03</HostId></Error>

I validated that I can create files using the credentials using the examples they provide. https://docs.digitalocean.com/reference/api/spaces-api/

jothoma1 commented 2 years ago

Hello @jszwedko do you have some informations regarding this ? Will it be possible ? thanks !

jszwedko commented 2 years ago

Hi @jothoma1 ,

Are there any logs in Vector surrounding that error? I'm curious if it is just the health check that is failing (HeadBucket) or the PutObjects. You could try with healthcheck.enabled = false.

JustinMason commented 2 years ago

I have tried disabling health checks and still see the issue. I am able to reproduce the same issue using ceph nano https://github.com/ceph/cn

I haven't debugged the ceph side, but knowing its easy to reproduce could help.

jszwedko commented 2 years ago

Thanks @JustinMason ! I unfortunately haven't been able to get Ceph Nano running locally yet.

Could you try running the latest nightly build of Vector with VECTOR_LOG set to aws_smithy_http=debug,vector=debug like:

VECTOR_LOG=aws_smithy_http=debug,vector=debug  vector  --config ...

(you can access the nightly builds via https://vector.dev/download/ by toggling the version)

This should output the AWS SDK requests and responses which would help narrow this down.

JustinMason commented 2 years ago

We found a work around that works and should help identify the source of the bug. The request headers have x-amz-tagging, and this is being included even if there are no tags explicitly configured. If you include any tag then it works.

   type: aws_s3
   tags: 
      tag1: foobar

@jszwedko

jszwedko commented 2 years ago

Ah, good catch @justinmason ! I opened https://github.com/vectordotdev/vector/pull/12027 to avoid setting this header if there are no tags.

Mihai-CMM commented 7 months ago

Hello any latest developments here ?