moloch-- / leakdb

Web-Scale NoSQL Idempotent Cloud-Native Big-Data Serverless Plaintext Credential Search
GNU General Public License v3.0
179 stars 27 forks source link

Search getting 0 results #5

Open enzyro opened 4 years ago

enzyro commented 4 years ago

Describe the bug I'm trying to match 3M lines of emails to a 700M lines (roughly 50GB), but after everything is going smoothly and after doing a bunch of tests, I can't get a single match returned, even on the emails/user that I know are in my dataset for sure. All the processes are running on an AWS instance (so I followed the server deployment steps), tried to build from source and use the released version, tried to split my data into smaller files, but still no results. I tried launching the server version and requesting through http request as well.

The really wieird thing is that when I run a search on the test folder you provide, it works properly with your provided indexes. But when I try to regenerate the indexes for small.txt using the doc from the wiki, I'm not getting any results and when I diff my generated index, and the one you provide, they differ, so I'm guessing it has something to do with how the index generation/sorting .

To Reproduce Steps to reproduce the behavior:

  1. ./leakdb-curator --format colon-newline --recursive --target ./large-folder-containing-all --output normalized.json
  2. ./leakdb-curator --json normalized.json
  3. ./leakdb-curator search -i leakdb/email.idx -j leakdb/bloomed.json -v "xxx@gmail.com" Response : Found 0 results ..
  4. grep -F "xxx@gmail.com" bloomed.json Response : {"email": "xxx", "user": "xxx", "domain": "gmail.com", "password": "xxx"}

I really wish I could get this to work because it looks amazing, I'm at your disposal for any questions/tests you want me to run.

Enzyro

flyingdan commented 3 years ago

Hey, just wondering if there are any updates on this issue. Just making my way though the code to see if anything jumps out at me too.

moloch-- commented 3 years ago

Sorry, not had much time to dig into it been very busy. Lmk if you find something!

GlitchWitch commented 1 year ago

@enzyro @flyingdan Did either of you ever find a solution to this? I am running into the same issue using the latest Linux release.