trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials
https://trufflesecurity.com
GNU Affero General Public License v3.0
15.65k stars 1.63k forks source link

Scan the contents of SQLite DB file #2095

Open bugbaba opened 10 months ago

bugbaba commented 10 months ago

Hello Team,

Description

Scan contents of the SQLite DB file for secrets inside it.

Preferred Solution

The SQLite DB files can be detected using the determineMimeType function at https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/archive.go#L416 and then dump the contents of the DB into a text file which can be scanned for keys inside it.

So that case like below could be detected image

Additional Context

Contents can be dump using the below command sqlite3 db_filename .dump > tmp_dumpfile

Discussion on how to do the same in golang https://github.com/mattn/go-sqlite3/issues/535

-- Regards, @bugbaba

zricethezav commented 10 months ago

@bugbaba thanks for opening this issue!

Scanning sqlite dbs is a really great idea. Ideally we would have a decoder be able to detect when a filesystem scan encountered a sqlite db. Adding new decoders is pretty easy. We're happy to accept PRs for adding a sqlite decoder.

rgmz commented 10 months ago

@bugbaba what is the output if you run strings db_filename? Is it substantially different from sqlite3 db_filename .dump > tmp_dumpfile?

I've seen results from several SQLite databases, so I'm curious what the difference would be.

bugbaba commented 10 months ago

With strings random characters get added along with the contents of the DB

For example, the dump file for this test db is

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "user_info" (
    "id"    INTEGER,
    "username"  TEXT,
    "api_key"   TEXT,
    PRIMARY KEY("id")
);
INSERT INTO user_info VALUES(1,'abc','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');
INSERT INTO user_info VALUES(2,'xyz','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');
COMMIT;

Whereas when we run strings on the file we get

SQLite format 3
Otableuser_infouser_info
CREATE TABLE "user_info" (
    "id"    INTEGER,
    "username"  TEXT,
    "api_key"   TEXT,
    PRIMARY KEY("id")
Gxyz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e
Gabc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

Now to the naked eye, the API key is surely visible in this output too. But as we can see random character G is added at the start of the line and e is added at the end first row entry.

So for example the github key detector regex trufflehog is using won't be able to detect it due to use of boundaries \b in the regex.

We can see the same in the screenshot from regex101.com image

dustin-decker commented 10 months ago

Small correction, this would be implemented as a handler. Handlers can work on whole files, decoders might only be processing at a small chunk of a file.

https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/handlers.go

Unpacking sqlite db files with a handler would be a welcome addition.

rgmz commented 9 months ago

Now to the naked eye, the API key is surely visible in this output too. But as we can see random character G is added at the start of the line and e is added at the end first row entry.

I just discovered a GitHub access token inside a binary file and it reminded me of this issue. Random junk is surely a problem that causes false negatives with other binary files.

I wonder if it would be feasible to create "fuzzy" patterns for binary files to try and pick up what would otherwise be high-confidence patterns. For example, \bghp_[a-fA-F0-9]{32}\b for text chunks but just ghp_[a-fA-F0-9]{32} for binary chunks, which could be ruled in or out with validation. (Obviously this wouldn't be a silver bullet and might cause other problems.)

bugbaba commented 9 months ago

This can be done for a nongeneric regex that has a unique prefix like ghp_ and a specific length. But for others, it's surely going to increase the number of false positives.

dxa4481 commented 9 months ago

I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection. This is because for some detectors that require keywords close by to help with detection if they don't have fixed prefix.

rgmz commented 9 months ago

I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection.

Based on some cursory testing, we may be able to create a dump using .mode that fulfills this. I have only tested this with individual selects, so I can't say whether there's an easy way to convert that into a dump.

sqlite> .mode json customers
sqlite> select * from customers;
[{"CustomerId":1,"FirstName":"John","LastName":"Doe","Company":"Business Enterprises","Address":"123 Fake St","City":"Springfield","State":"CO","Country":"USA","PostalCode":"12227-000","Phone":"1234567890","Fax":null,"Email":"john@example.com","SupportRepId":3},
...

The line and insert modes might also work as it seems to specify the column.

Edit: unfortunately, .dump doesn't seem to respect .mode and will always default to INSERT INTO .... Leveraging .mode might require parsing all views/tables and doing select statements "by hand".

ankushgoel27 commented 3 weeks ago

it will be great if we can resolve this issue.