Open mhite opened 2 years ago
Hey @mhite, this is a totally reasonable addition. I've held off from implementing this myself in the past because I'm not too familiar with google services but we already have most of the work from the aws_s3
input ready to port over.
@Jeffail - That's great! Google Cloud can push their logs not just to Pub/Sub (where the existing input seems to work great, btw) but also GCS as small chunk files in NLD-JSON format. When you don't need that log data absolutely immediately but can deal with minor delay, GCS is great and should cost a lot less than Pub/Sub. Adding the Pub/Sub notifications on object create / finalize to trigger download by Benthos definitely creates a nice "middle-ground" where it's not quite streaming and not scheduled/cron batch. (Note that from a cost perspective, yes you are still creating Pub/Sub messages but far fewer since each file created contains a batch of log messages.)
I was briefly playing with the GCS input which looks like it behaves like a batch input, exiting when no more data is left to process. It would be nice to have it run in a "monitor" mode, running continuously and detecting new objects to download based upon Pub/Sub notifications..
GCS provides a notification via Pub/Sub which can trigger on storage object events.
https://cloud.google.com/storage/docs/pubsub-notifications
Might there be a way to approximate similar functionality for GCS like we see in the SQS feature[1] in the AWS S3 input?
https://www.benthos.dev/docs/components/inputs/aws_s3#sqs
Short of adding new functionality to the GCS input, is there a clever way of parsing the Pub/Sub notifications, storing the object names encoded in the notifications, and then firing off another pipeline to consume and download? Or is this really better served with an enhancement to the existing GCS input? (I'm thinking it probably is an enhancement, but I am new to Benthos so perhaps there is some awesome approach I am missing!)
Thanks for all your help,
-M