sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
884 stars 209 forks source link

Optimize importing NSRL "full" versions of RDS hash sets #2160

Closed wladimirleite closed 2 months ago

wladimirleite commented 2 months ago

As discussed in https://github.com/sepinf-inc/IPED/discussions/2155.

I am finishing a test importing a "full" (not "minimal") version of the latest NSRL Android SQLite. It is taking a lot of time (~10 hours so far, 81% processed). I will update here the final numbers. Although it should be faster than @paulobreim observed in his environment (probably there is something else going on there), it is way too slow ("minimal" version was imported in 23 minutes in my PC, and it is only ~14% smaller than the "full" version, comparing the SQLite file length).

Good news is that I found a simple way to speed it up, specific for NSRL SQLite files processing. The gain with the "minimal" version should be small, but it should make quite some difference for the full version.

As I mentioned in the discussion, regardless of a possible optimization, I recommend using "minimal" versions (at least for IPED's usage).

wladimirleite commented 2 months ago

It finally finished (took more than 16 hours). I will test now with the changes that I made.

Connected to database d:\iped-hashes.db
Database tables and indexes created.
Last HASH_ID = 0
Last PROPERTY_ID = 0
Properties loaded = 0

Reading NSRL_DB file g:\RDS_2024.03.1_android.db...
97148886 records read in 59027 seconds.
29406404 hashes inserted.
36897886 hashes updated.
20988875 hashes were already in the database.
5614548 zero length hashes were ignored.
4241173 records combined.

Commiting changes...
Commit completed in 6 seconds.
wladimirleite commented 2 months ago

Full version importing is much faster now.

Full:    16 hours   -> 24 minutes
Minimal: 23 minutes -> 21 minutes
paulobreim commented 2 months ago

uauuu, Excelent ! I want to test it. what was the magic?

paulobreim commented 2 months ago

The impact this made is incredible.

paulobreim commented 2 months ago

I tested with the android full and finished in 24 minutes.

paulobreim commented 2 months ago

I tested again, this time from SSD to SSD and comparing the tests using RAM, there wasn't much of a difference. Now the base is complete and the results were: RDS_2024.03.1_android - 25" RDS_2024.03.1_ios - 52" RDS_2024.03.1_legacy - 2:19" RDS_2024.03.1_modern - 5:29"

Thank you for your help

wladimirleite commented 2 months ago

Thanks @paulobreim for reporting and testing this issue. One last comment... For the "modern" hash set, I believe that the "minimal" version would be the one with the most noticeable difference (in terms of importing time) compared to the "full" version.

paulobreim commented 2 months ago

I tested the time difference in using the hash base. Performed on an image of a Samsung SM-G780G, obtained by cellebrite, which generated the files below.

17/08/2023 12:26 4.068.589.593 EvidenceCollection_2023-08-17_Report.ufdr 17/08/2023 12:06 4.928.307.200 EvidenceCollection_2023-08-17_Report.z01 17/08/2023 12:07 4.928.307.200 EvidenceCollection_2023-08-17_Report.z02 17/08/2023 12:07 4.928.307.200 EvidenceCollection_2023-08-17_Report.z03 17/08/2023 12:08 4.928.307.200 EvidenceCollection_2023-08-17_Report.z04 17/08/2023 12:08 4.928.307.200 EvidenceCollection_2023-08-17_Report.z05 17/08/2023 12:09 4.928.307.200 EvidenceCollection_2023-08-17_Report.z06 17/08/2023 12:09 4.928.307.200 EvidenceCollection_2023-08-17_Report.z07 17/08/2023 12:10 4.928.307.200 EvidenceCollection_2023-08-17_Report.z08 17/08/2023 12:11 4.928.307.200 EvidenceCollection_2023-08-17_Report.z09 17/08/2023 12:11 4.928.307.200 EvidenceCollection_2023-08-17_Report.z10 17/08/2023 12:12 4.928.307.200 EvidenceCollection_2023-08-17_Report.z11 17/08/2023 12:12 4.928.307.200 EvidenceCollection_2023-08-17_Report.z12 17/08/2023 12:13 4.928.307.200 EvidenceCollection_2023-08-17_Report.z13 17/08/2023 12:14 4.928.307.200 EvidenceCollection_2023-08-17_Report.z14 17/08/2023 12:24 4.928.307.200 EvidenceCollection_2023-08-17_Report.z15

Processing time without using iped-hashes.db 40 minutes. Processing time using iped-hashes.db 21 minutes.

Both generated 435,610 items in IPED, but what caught my attention is that in the IPED Evidences item, the item EvidenceCollection_2023-08-17_Report.z01 does not appear. I don't know if this is correct or not.

image

paulo