hartfordfive / cloudflarebeat

ELK beat to fetch Cloudflare logs via the Enterprise Log Share API
Other
11 stars 8 forks source link

Log collection form ELS API for given time block can become too large #9

Closed hartfordfive closed 7 years ago

hartfordfive commented 7 years ago

Although the ELS API currently does allow a count of items to be specified along with a timestamp start and end rage, it does not return any type of header returning how many logs items in total are within given time range. Due to this, a large number of logs may be downloaded which can become quite heavy for in-memory processing.

As a solution, the logs should initially be saved to a gzip file and then read from this file into smaller chunks.

hartfordfive commented 7 years ago

The previously proposed solution was applied although still wasn't effective enough in terms of total time for downloading/processing.

Description of the Issue:

Current log file download times can exceed 5 minutes and processing time can go up to 10 minutes. Each ticker itteration is currently defaulted to 30 minutes but unfortunately the ticket for the next itteration doesn't start until the current processing is completed, so in this case it's (30 + 5 + 10) = 45 minutes.
Now keep on adding this time over the period of a day and within 24 hours you can easily end up with a 4 to 5 hour delay in processing time, instead of a more reasonable 30 minutes. These delays will only grow as the traffic increases and the log files increase in size.

Proposed Solution:

Once the period has passed for beat's ticker, two functions would be called asynchronously to perform the following

  1. Download ELS Log File: This function creates X goroutine(s) (# yet to be determined, maybe a pool) to each download the log files (sequentially/in-parallel) parts and place them on log_files_ready channel once completed.
    • If 2 minute segments, then 15 files total for 30 minutes
    • If 5 minute segements, then 6 files total for 30 minutes
  2. Process/Publish Individual Log Entries: Have another function (also via a goroutine) process these files asynchronously from the log_files_ready channel as they are ready
    • X goroutine(s) (# yet to be determined, maybe a pool) are created so that each can open the log file and then send of its processed events via PublishEvent

This may not be the absolute best solution, but it should be more affective than the current one. If more optimizations need to be done later, I'll deal with it then.