tomnomnom / meg

Fetch many paths for many hosts - without killing the hosts
MIT License
1.59k stars 266 forks source link

Arg for filtering out undesirable content --regexignore #57

Open nseratt opened 4 years ago

nseratt commented 4 years ago

Hello @tomnomnom

Hope this can be implemented. I've run into instances where a page will give you a 200 on every request, only to be greeted by an html page saying that it is not found , or that an error occurred.

Feel free to modify as needed. Wasn't sure if -x or --regexignore were the best arg names for it.

Thank you


Filter out matches where the content body matches a provided regex pattern.

example usage: --regexignore "(Page not found|error)"</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src=""></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src=""></script> <script src=""></script> <script> hljs.highlightAll(); </script> </body> </html>