skormos / varlog-parser

MIT License
0 stars 0 forks source link

Feedback - Perf degradation on large files #11

Open fbonecco-cribl opened 2 years ago

fbonecco-cribl commented 2 years ago

Hey @skormos - This is Franco from Cribl. First of all, thanks again for taking the time to complete the assignment.

I've been looking at your code and running different test scenarios. I noticed some sort of performance degradation as the requested file size grows. I'm using numEntries=10, on files of different sizes, to get the last 10 log entries. Here're are the results I've got for comparison:

File 1 - 100-lines.log - 12K

➜  varlog-parser git:(main) ls -alth /var/log/100-lines.log
-rw-rw-r-- 1 goat goat 12K May 18 17:27 /var/log/100-lines.log
➜  varlog-parser git:(main) curl -o /dev/null -s -w 'Total: %{time_total}s\n' 'http://localhost:8080/api/varlog/1http://localhost:8080/api/varlog/100-lines.log?numEntries=10' 
Total: 0.001245s

File 2 - 2M.log - 173M

➜  varlog-parser git:(main) ls -alth /var/log/2M.log                                                                                                                          
-rw-rw-r-- 1 goat goat 173M May 11 16:39 /var/log/2M.log
➜  varlog-parser git:(main) curl -o /dev/null -s -w 'Total: %{time_total}s\n' 'http://localhost:8080/api/varlog/1http://localhost:8080/api/varlog/2M.log?numEntries=10'   
Total: 0.002919s

File 3 - logfile.log - 22G

➜  varlog-parser git:(main) ls -alth /var/log/logfile.log                                                                                                                          
-rw-r--r-- 1 goat goat 22G Apr 27 19:44 /var/log/logfile.log
➜  varlog-parser git:(main) curl -o /dev/null -s -w 'Total: %{time_total}s\n' 'http://localhost:8080/api/varlog/logfile.log?numEntries=10'           
Total: 34.575618s

While File 1 and File 2 didn't present any noticeable difference, File 3 took way longer to complete. Since I was just requesting the last 10 entries, I'd have expected all these 3 calls to take roughly the same amount of time (which should be a few ms), no matter what the requested file size was. Agree?

Can you please try to find what's causing this perf degradation and address the problem?

Let me know if you have any questions!

skormos commented 2 years ago

I understood right away the issue, but finding a different way to approach the change required me to carve out some time to get familiar with Golang's api for File seek and read.

The read time is now constant. Reading 100 lines from either a 1, 2 or 14 GB file is ~40µs. Reading 100k records from the 14GB file is less than 10ms.

I suspect this change should address the concerns and fulfill the requirements.