moloch-- / leakdb

Web-Scale NoSQL Idempotent Cloud-Native Big-Data Serverless Plaintext Credential Search
GNU General Public License v3.0
179 stars 27 forks source link

[Feature] Monitor list of keywords #4

Open blueteamzone opened 4 years ago

blueteamzone commented 4 years ago

Feature request Is it feasible to give a list of keywords and when a database is getting indexed if any of those keywords match, the tool sends a notification to Slack/Telegram/Email. This feature would allow active monitoring rather than searching for those keywords once the database is created. For example, the keyword apple.com if it's added to a monitoring list and if a certain leak or database contains multiple leaks for that domain. It's easy to remidiate and request password reset for those leaked credentials. What do you think about it?

moloch-- commented 4 years ago

First some background: So we sort basically already support this, but it's only realistically implemented via the "serverless" version with a BigQuery backend. In this deployment you can simply use leakdb domain apple.com to get all of the results in the dataset that match apple.com, you can even use leakdb domain apple.com --email-only since you may not care about the passwords. Of course as you update your datasets you'd have to re-issue this query and then diff to only find the new accounts, which isn't ideal since we don't currently have any way of calculating this diff (more on this later).

For the indexed version, domains are very tricky due the high number of collisions in values (and since we're using quicksort), it can take a very long time to sort domain indexes.

So if you only care about new accounts as they get added to an existing dataset, which it seems like that's what you're suggesting we could build this feature but there's some complexities to keep in mind: If we implement this feature during the normalization/indexing steps we'll likely run into duplicate entries that already exist in the dataset since we've not applied the bloom filter yet, and we cannot query the existing dataset to detect duplicates because the cost would be insanely high to do so (money and performance wise).

There's two approaches I think we can make work though:

moloch-- commented 4 years ago

I'm thinking the bloom filter callback option may actually be better, for your use case since you don't really care about querying the data you only want to know if there's a certain account type in a stream of multiple database leaks.

moloch-- commented 4 years ago

This is pretty simple to add, I'm thinking we just add some type of --web-hook flag option, then you can wire it to anything you want (e.g. Slack/etc).

blueteamzone commented 4 years ago

I think there is a problem with leakdb domain apple.com --email-only. I don't think that this approach is scalable. Imagine if you have to monitor hundreds of domains. Adding them one after the other would be tedious, moreover adding a callback option to the bloom filter is scalable to monitor more than 1 domain?

moloch-- commented 4 years ago

Yea, in the existing setup if you had hundreds of domains you'd probably just want to call the JSON API directly instead of using the CLI, but I can see the use case you're pointing to.

I think the other advantage of the bloom filter callback is that you wouldn't even need to deploy the server/BigQuery, all you need is to save/load the bloom filter, so it's far more cost effective too.

blueteamzone commented 3 years ago

Hey, any new updates related to this enhancement?

moloch-- commented 3 years ago

No movement yet sorry, I've been distracted with other projects. However, I may be circling back to LeakDB in the near future.