sampointer / fluent-plugin-cloudwatch-ingest

Alternative to ryotarai/fluent-plugin-cloudwatch-logs for ingesting AWS Cloudwatch logs via fluentd
GNU General Public License v3.0
11 stars 10 forks source link

Some major fixes... #3

Closed chaeyk closed 7 years ago

chaeyk commented 7 years ago
  1. print boom.insect instead of boom

  2. use next_token and timestamp next_token is valid for 12 hours. if fluentd is down for more that 12 hours, api returns error, and this plugin repeats loop infinitly. so, we should save next_token and timestamp to statefile, and when token is invalid we should query with timestamp. And, because statefile's structure is changed, I added statefile's migration funtionality.

  3. statefile should be truncated before saving.

  4. changing interval policy I think there is 3 situation to use interval

    a. api error b. api success, and no data c. api success, and some data ingested

    When massive log aggregation is in progress, case c's interval should be small or zero. In case a and b, we can use large interval. So, I changed interval value to api_interval in case b. This breaks compatibility with version 4.0 and lower.

I'm sorry that my code is not beautiful. It's because I don't know about ruby.

sampointer commented 7 years ago

Many thanks for this. I will do my best to review and test this over the coming week.

sampointer commented 7 years ago

I have built a version of the gem from your master and have it in testing with one of our low-traffic logging shards. The happy path functions and I intend on testing the failure modes over the coming days. After this point I'll roll it out to some more active shards and update here.

sampointer commented 7 years ago

Over the last 24 hours I tested point 2. I let some heartbeat messages accumulate in Cloudwatch with a stopped daemon for 24 hours and then re-enabled the daemon. All logs were correctly ingested.

I hope to use my load harness to test point 4 tomorrow.

chaeyk commented 7 years ago

I have some more fixes.

  1. Should not save next-token when stream has been just created. AWS sometimes returns invalid token in that case, and with that token you can never get any events. This is done in my repo.

  2. When stream is gone, next-token should be removed from statefile. If not, when stream is recreated, next token in statefile would not be valid. I'm working on it.

When my pull request is merged, I'll register another pull request. I think your release can be done after it.

sampointer commented 7 years ago

Excellent! In which case I think the best option would be for me to merge the PR now to make your life easier. Once all of your fixes are in master we can repeat the 12 hour test, perform the load test and test stream recreation all at once, and then cut a new major version release.

sampointer commented 7 years ago

If you could fix up some of the lint along the way it would be a great help: https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest/180

However, functionality is more important. I can fix the lint before the release if required.

chaeyk commented 7 years ago

I'm new to that tool, but it looks like very useful. I'll see what I can do.