vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.6k stars 1.55k forks source link

Request: Add google cloud storage source #7501

Open brianpham opened 3 years ago

brianpham commented 3 years ago

As a replacement for logstash, we will need vector to support google cloud storage as a source similar to AWS S3 (https://vector.dev/docs/reference/configuration/sources/aws_s3/)

We do something similar in logstash

input { 
  google_cloud_storage {    
    bucket_id => "my_log_bucket"    
    file_matches =>  ".*\.log"    
    tags => ["server"]    
    codec => "json"  
  }
}
jszwedko commented 3 years ago

Discord request: https://discord.com/channels/742820443487993987/746070591097798688/847095555225944135

maxdialpad commented 1 year ago

Is there anything on the roadmap for this source?

jszwedko commented 1 year ago

Not yet, but we have been experimenting with OpenDAL, which was recently used to add a WebHDFS sink, and does have support for GCS. It could be an avenue to experiment with if anyone wants to take a shot at this.

swgillespie commented 4 months ago

@jszwedko Would you accept a PR that implements this in roughly the same way that the aws_s3 source is implemented, i.e. via event notifications in a PubSub topic?

jszwedko commented 4 months ago

@jszwedko Would you accept a PR that implements this in roughly the same way that the aws_s3 source is implemented, i.e. via event notifications in a PubSub topic?

Hey! Yes, I think that would make sense as the initial implementation to match the behavior of the aws_s3 source.

Xuanwo commented 19 hours ago

Not yet, but we have been experimenting with OpenDAL, which was recently used to add a WebHDFS sink, and does have support for GCS. It could be an avenue to experiment with if anyone wants to take a shot at this.

Hi, @jszwedko. I'm willing to help implement the GCS source, but I might not have time to complete the full documentation. Do you think it's a good idea to start the implementation first? For example, all content under src/sinks/webhdfs but not website/**/webhdfs.