remotehack / remotehack.github.io

https://remotehack.space
MIT License
7 stars 3 forks source link

Gitpod-based material to practice log parsing #160

Closed lpmi-13 closed 2 years ago

lpmi-13 commented 2 years ago

We haven't added one of these in a while, so I thought I'd add one while I'm thinking of it...

There are lots of useful commands to parse logs (eg, grep, awk, etc), and it might be nice to have a disposable environment to practice them in. So I'm going to make a micromaterial intended to be run in gitpods (but also possible locally, just requiring a clone of several hundred megabytes of data) where you can practice finding various things by using these command line tools.

There are lots of free data sets at kaggle: https://www.kaggle.com/datasets?search=server+logs

The plan is to grab that data and see what we can find! Half exploratory data analysis, half treasure hunt!

lpmi-13 commented 2 years ago

One potential task would be to parse N number of log files and get the average request count for each IP address.

...or top 10 noisiest IPs or ones that only show up once, or whatever

lpmi-13 commented 2 years ago

Another interesting use case would be log all network requests when accessing YouTube videos, output those to logs, and try to process/filter them so we only have a list of domains that serve ads, a la pihole block list.

lpmi-13 commented 2 years ago

actually...absent an insanely large log file...this might as well just be done locally instead of in gitpod, since there's no real benefit to doing this in a remote environment (unless you normally work in windows and want some linux command line practice).

Gonna close this as done, since it took me about 20 minutes to create https://github.com/lpmi-13/sadpods-logsearch