mlsecproject / combine

Tool to gather Threat Intelligence indicators from publicly available sources
https://www.mlsecproject.org/
GNU General Public License v3.0
652 stars 179 forks source link

Allow 'reaper' to continue processing if it gets invalid entries #78

Closed alexcpsec closed 9 years ago

alexcpsec commented 9 years ago

If there is an "invalid" http entry (returning something different from 200 or 302) or an "invalid" file entry (the file does not exist in the filesystem), it should log that it was unable to continue and carry on.

This is an issue specifically for file:// since #48 was done.

krmaxwell commented 9 years ago

This has a dependency on #34

alexcpsec commented 9 years ago

Agree. Good point

On Wed, Sep 17, 2014 at 11:18 PM, Kyle Maxwell notifications@github.com wrote:

This has a dependency on #34

Reply to this email directly or view it on GitHub:

https://github.com/mlsecproject/combine/issues/78#issuecomment-55999232


This e-mail message and any files transmitted with it contain legally privileged, proprietary information, and/or confidential information, therefore, the recipient is hereby notified that any unauthorized dissemination, distribution or copying is strictly prohibited. If you have received this e-mail message inappropriately or accidentally, please notify the sender and delete it from your computer immediately.

gbrindisi commented 9 years ago

I've investigated the issue a little, and seems like the stable version of grequests in pypi doesn't support the exception_handler parameter in map() to catch http exceptions.

There is a fix in https://github.com/kennethreitz/grequests/pull/58 but it's not yet merged.

With the fix it's possible to do:

def exception_handler(request, exception):
    logger.error("Request %r failed: %r" % (request, exception))

inbound_responses = grequests.map(reqs, exception_handler=exception_handler)

I quickly tested it by removing the stable grequests and installing the patched version and it works as expected:

pip install https://github.com/rtdean/grequests/archive/0.3.0.zip

An example output:

2014-10-09 08:12:13,605 - combine.reaper - INFO - Fetching outbound URLs
2014-10-09 08:12:31,071 - combine.reaper - ERROR - Request <grequests.AsyncRequest object at 0x89f7fcc> failed: ConnectionError(MaxRetryError("HTTPSConnectionPool(host='zeustracker.abuseAAA.ch', port=443): Max retries exceeded with url: /blocklist.php?download=ipblocklistAAA (Caused by <class 'socket.error'>: [Errno 2] No such file or directory)",),)
2014-10-09 08:12:31,071 - combine.reaper - ERROR - Request <grequests.AsyncRequest object at 0x8a560ac> failed: ConnectionError(MaxRetryError("HTTPSConnectionPool(host='spyeyetracker.abuseAAA.ch', port=443): Max retries exceeded with url: /blocklist.php?download=ipblocklist (Caused by <class 'socket.error'>: [Errno 2] No such file or directory)",),)

Would it be a valid solution for you or would you like more to only have stable dependencies?

alexcpsec commented 9 years ago

I guess the answer really depends on:

My vote would be more leaning towards using the patched version and requiring the setup process to use it for now. But I have much less experience (i.e., none) on versioning management in Python then @technoskald , so I would defer to him.

gbrindisi commented 9 years ago

This should do the trick:

pip install -e git+https://github.com/rtdean/grequests@0.3.0#egg=grequests

Pip freeze output:

$ pip freeze
CsvSchema==1.1.1
argparse==1.2.1
beautifulsoup4==4.3.2
feedparser==5.1.3
gevent==1.0.1
greenlet==0.4.2
-e git+https://github.com/rtdean/grequests@19239a34b00b8ac226b21f01b0fb55e869097fb7#egg=grequests-origin/0.3.0
netaddr==0.7.12
pygeoip==0.3.1
requests==2.3.0
wsgiref==0.1.2
alexcpsec commented 9 years ago

Sounds good to me. I'll subscribe the issue on grequests so we can know when it is done. :)

Any closing thoughts, @tecknoskald?

On Thu, Oct 9, 2014 at 7:40 AM, Gianluca Brindisi notifications@github.com wrote:

This should do the trick:

pip install -e git+https://github.com/rtdean/grequests@0.3.0#egg=grequests

Pip freeze output:

$ pip freeze
CsvSchema==1.1.1
argparse==1.2.1
beautifulsoup4==4.3.2
feedparser==5.1.3
gevent==1.0.1
greenlet==0.4.2
-e git+https://github.com/rtdean/grequests@19239a34b00b8ac226b21f01b0fb55e869097fb7#egg=grequests-origin/0.3.0
netaddr==0.7.12
pygeoip==0.3.1
requests==2.3.0
wsgiref==0.1.2

Reply to this email directly or view it on GitHub:

https://github.com/mlsecproject/combine/issues/78#issuecomment-58503034


This e-mail message and any files transmitted with it contain legally privileged, proprietary information, and/or confidential information, therefore, the recipient is hereby notified that any unauthorized dissemination, distribution or copying is strictly prohibited. If you have received this e-mail message inappropriately or accidentally, please notify the sender and delete it from your computer immediately.

alexcpsec commented 9 years ago

This would fix #32 also :)