josehelps / git-wild-hunt

A tool to hunt for credentials in github wild AKA git*hunt
Apache License 2.0
291 stars 45 forks source link

Duplicate output in logs and JSON file #4

Closed phx closed 4 years ago

phx commented 4 years ago

All warnings for matches are displayed twice in the output, and duplicate entries exist in results.json, even after it is deleted before the script has run.

2020-08-22 11:16:43,640 - INFO - git-wild-hunt - processing potential leak #1 on redacted
2020-08-22 11:16:43,906 - INFO - git-wild-hunt - processing potential leak #2 on redacted
2020-08-22 11:16:44,135 - INFO - git-wild-hunt - processing potential leak #3 on redacted url A
2020-08-22 11:16:44,321 - WARNING - git-wild-hunt - url: redacted
check: Amazon AWS Access Key ID matches: ['redacted key A']
2020-08-22 11:16:44,321 - WARNING - git-wild-hunt - url: redacted
check: AWS API Key matches: ['redacted key A']
2020-08-22 11:16:44,908 - INFO - git-wild-hunt - processing potential leak #4 on redacted
2020-08-22 11:16:45,282 - INFO - git-wild-hunt - processing potential leak #5 on redacted
2020-08-22 11:16:45,616 - INFO - git-wild-hunt - processing potential leak #6 on redacted
(venv) ~/git/git-wild-hunt$ cat results.json | jq '.[] | .url' | wc -l
203
(venv) ~/git/git-wild-hunt$ cat results.json | jq '.[] | .url' | sort -u | wc -l
100
(venv) ~/git/git-wild-hunt$ cat results.json | jq '.[] | .matches' | grep '"' | wc -l
219
(venv) ~/git/git-wild-hunt$ cat results.json | jq '.[] | .matches' | grep '"' | sort -u | wc -l
104
phx commented 4 years ago

I realized the error as you can see above. There are duplicate regexes that match for both "AWS Access Keys" and "AWS API Keys". You may be able to close this issue, but I still think there are duplicates in the output. Would it be possible to add the top level key as the individual URL to each JSON object? That way there would definitely only be single entries per URL.

phx commented 4 years ago

Removing the duplicate match in regexes.json fixes the duplicates issue. I might submit a PR for this.

josehelps commented 4 years ago

Thank you @phx merged the PR I really appreciate you catching this!