Closed arslanm closed 6 years ago
I'm pretty sure what's happening here are some of the same race conditions that were causing issues with logstash-output-google_bigquery around fd leaks.
This should be fixable by reworking the plugin to use a single queue for files needing to be uploaded and only adding a file to that queue IF it's actually ready i.e. the current file shouldn't be in there like it is today. The worker pool should also help #5
We're using a slightly modified version of this plugin (3.0.4) in our infrastructure. There are two differences:
just below
on line 149, we have:
to remove "@" sign from the field names to make Bigquery happy. The second difference is the filename format. Instead of doing this in
get_base_path
:we do this:
so it is possible to use prefix on Google GCS with a single wildcard to load daily files into Google Bigquery -- i.e., instead of "prefix_hostname_YYYY-MM-DD" file names will have the pattern "prefix_YYYY-MM-DD_hostname" which makes it possible to use a single wildcard to load into BQ when there are multiple logstash instances.
Ex: prefixYYYY-MM-DD which works vs prefix_hostYYYY-MM-DD* which does not)
The instances that run logstash with this plugin eventually run out of disk space. When I run lsof I get this: