moloch-- / leakdb

Web-Scale NoSQL Idempotent Cloud-Native Big-Data Serverless Plaintext Credential Search
GNU General Public License v3.0
179 stars 27 forks source link

"source" field in normalized JSON? #2

Open darrenmartyn opened 4 years ago

darrenmartyn commented 4 years ago

Would it be feasible to add a "source" field to the JSON/indexed data, so you could "tag" entries as being from certain leaks.

This could be very useful when trying to go back later and attribute where a piece of data came from - but unsure if it would have performance impacts?

moloch-- commented 4 years ago

I don't think it would have much of an impact on performance, most of the code operates on lines not the actual content of the line, so there's little code that would need to change too. A few other folks have been asking for something like this so I'll probably look at adding it. It would affect the bloom filter's ability to effectively de-duplicate identical user/password combos since they'd be from different sources, so there'd could be a modest impact to index/sort times but i don't think there'd be a large impact to search times.

aaronkaplan commented 3 years ago

Any news on this feature request?

moloch-- commented 3 years ago

Not had time to work on it yet sorry!

aaronkaplan commented 3 years ago

On 16.02.2021, at 01:25, Joe notifications@github.com wrote:

Not had time to work on it yet sorry!

No worries, just wanted to figure out what the status ist. What would be needed ? I.e. is it simple enough as a non-go coder to add it?

Best, a.

moloch-- commented 3 years ago

Maybe, most of the code only cares about "lines" in a file, you'd have to extend the normalizer to add a "source" field to the JSON format, and extend the few parts of the code that parse the JSON to optionally deal with the extra field.