Open bugbaba opened 10 months ago
@bugbaba thanks for opening this issue!
Scanning sqlite dbs is a really great idea. Ideally we would have a decoder
be able to detect when a filesystem scan encountered a sqlite db. Adding new decoders is pretty easy. We're happy to accept PRs for adding a sqlite decoder.
@bugbaba what is the output if you run strings db_filename
? Is it substantially different from sqlite3 db_filename .dump > tmp_dumpfile
?
I've seen results from several SQLite databases, so I'm curious what the difference would be.
With strings
random characters get added along with the contents of the DB
For example, the dump file for this test db is
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "user_info" (
"id" INTEGER,
"username" TEXT,
"api_key" TEXT,
PRIMARY KEY("id")
);
INSERT INTO user_info VALUES(1,'abc','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');
INSERT INTO user_info VALUES(2,'xyz','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');
COMMIT;
Whereas when we run strings on the file we get
SQLite format 3
Otableuser_infouser_info
CREATE TABLE "user_info" (
"id" INTEGER,
"username" TEXT,
"api_key" TEXT,
PRIMARY KEY("id")
Gxyz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e
Gabc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
Now to the naked eye, the API key is surely visible in this output too. But as we can see random character G
is added at the start of the line and e
is added at the end first row entry.
So for example the github key detector regex trufflehog is using won't be able to detect it due to use of boundaries \b
in the regex.
We can see the same in the screenshot from regex101.com
Small correction, this would be implemented as a handler. Handlers can work on whole files, decoders might only be processing at a small chunk of a file.
https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/handlers.go
Unpacking sqlite db files with a handler would be a welcome addition.
Now to the naked eye, the API key is surely visible in this output too. But as we can see random character
G
is added at the start of the line ande
is added at the end first row entry.
I just discovered a GitHub access token inside a binary file and it reminded me of this issue. Random junk is surely a problem that causes false negatives with other binary files.
I wonder if it would be feasible to create "fuzzy" patterns for binary files to try and pick up what would otherwise be high-confidence patterns. For example, \bghp_[a-fA-F0-9]{32}\b
for text chunks but just ghp_[a-fA-F0-9]{32}
for binary chunks, which could be ruled in or out with validation. (Obviously this wouldn't be a silver bullet and might cause other problems.)
This can be done for a nongeneric regex that has a unique prefix like ghp_
and a specific length. But for others, it's surely going to increase the number of false positives.
I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection. This is because for some detectors that require keywords close by to help with detection if they don't have fixed prefix.
I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection.
Based on some cursory testing, we may be able to create a dump using .mode
that fulfills this. I have only tested this with individual selects, so I can't say whether there's an easy way to convert that into a dump.
sqlite> .mode json customers
sqlite> select * from customers;
[{"CustomerId":1,"FirstName":"John","LastName":"Doe","Company":"Business Enterprises","Address":"123 Fake St","City":"Springfield","State":"CO","Country":"USA","PostalCode":"12227-000","Phone":"1234567890","Fax":null,"Email":"john@example.com","SupportRepId":3},
...
The line
and insert
modes might also work as it seems to specify the column.
Edit: unfortunately, .dump
doesn't seem to respect .mode
and will always default to INSERT INTO ...
. Leveraging .mode
might require parsing all views/tables and doing select statements "by hand".
it will be great if we can resolve this issue.
Hello Team,
Description
Scan contents of the SQLite DB file for secrets inside it.
Preferred Solution
The SQLite DB files can be detected using the
determineMimeType
function at https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/archive.go#L416 and then dump the contents of the DB into a text file which can be scanned for keys inside it.So that case like below could be detected
Additional Context
Contents can be dump using the below command
sqlite3 db_filename .dump > tmp_dumpfile
Discussion on how to do the same in golang https://github.com/mattn/go-sqlite3/issues/535
-- Regards, @bugbaba