mlsecproject / combine

Tool to gather Threat Intelligence indicators from publicly available sources
https://www.mlsecproject.org/
GNU General Public License v3.0
654 stars 171 forks source link

Duplicate indicators in CRITs #124

Open degimi opened 9 years ago

degimi commented 9 years ago

As discussed with Alex - currently in CRITs, if an indicator exists and you run the script, in the details/Sources of the IP you can see the following:

alienvault (4): 2015-03-11

Method: trawl Reference: http://reputation.alienvault.com/reputation.data Analyst: API Created: 2015-03-11 13:20:58.188000

Method: trawl Reference: http://reputation.alienvault.com/reputation.data Analyst: API Created: 2015-03-11 13:40:17.452000

Method: trawl Reference: http://reputation.alienvault.com/reputation.data Analyst: API Created: 2015-03-11 14:17:23.745000

etc -

So, running the script everyday and the indicator is always in that feed, you will see a very long list in the "wrong place".

An idea to fix the issue could be:

1) check if the indicator exists from that source 1.1) if it is not we add it normally 1.2) if it is, and the source is different, we add the new source in CRITs and keep the information in CRITs/"sources" box example:

Method: trawl Reference: http://url.feed1/indicator Analyst: API Created: 2015-03-12 13:40:17.452000

1.3) if it is and the source is the same, we add in the CRITs/"comments" box the history in the following format http://reputation.alienvault.com/reputation.data dated 2015-03-09 http://reputation.alienvault.com/reputation.data dated 2015-03-10 http://url.feed1/indicator dated 2015-03-11 etc etc

makes sense?

alexcpsec commented 9 years ago

@paulpc what do you think of this? Is this doable?

paulpc commented 9 years ago

I liked the added sources - it keeps track of where the IP is coming from and when - a cheap way to analyze when you saw the IP and how often it appeared where it did.

Now that i'm done complaining, it's doable - it'll put more pressure on mongo and the API. I thinnk we'd have to search the API for each IP - it might be more expeditious to get a copy of everything in CRITs for the sources in combine and check for duplicates in memory before uploading. At the same time, I should have also looked to make sure the campaigns / sources exist in CRITs and if not, either prompt, or create them.

paulpc commented 9 years ago

@alexcpsec, here's a temporary fix for this: https://github.com/mlsecproject/combine/pull/129 I got to look at @sooshie 's branch and it has a lot of promise - once you're ready to head that direction, i would love to add my edits / ideas in there.

alexcpsec commented 9 years ago

@sooshie 's stuff is almost done and is going to be 0.2.0. There are still some non terminating thread oddities to fix there.

On Sat, Mar 21, 2015 at 6:16 PM, Paul Poputa-Clean notifications@github.com wrote:

@alexcpsec, here's a temporary fix for this: https://github.com/mlsecproject/combine/pull/129

I got to look at @sooshie 's branch and it has a lot of promise - once you're ready to head that direction, i would love to add my edits / ideas in there.

Reply to this email directly or view it on GitHub:

https://github.com/mlsecproject/combine/issues/124#issuecomment-84517698


This e-mail message and any files transmitted with it contain legally privileged, proprietary information, and/or confidential information, therefore, the recipient is hereby notified that any unauthorized dissemination, distribution or copying is strictly prohibited. If you have received this e-mail message inappropriately or accidentally, please notify the sender and delete it from your computer immediately.