trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials
https://trufflesecurity.com
GNU Affero General Public License v3.0
17.34k stars 1.72k forks source link

Memory Usage Increased 500+% #2354

Open caveman8fb opened 9 months ago

caveman8fb commented 9 months ago

Please review the Community Note before submitting

TruffleHog Version

trufflehog 3.64.0 and up

Expected Behavior

Memory usage between versions should have minor increases based on new functionality

Actual Behavior

Testing using time with 4 latest versions shows that after 3.63.11 memory usage greatly increased while scan_duration decreased

trufflehog 3.63.11 "scan_duration": "5.629426921s"} Max Resident Memory: 144116

trufflehog 3.64.0 "scan_duration": "2.481353563s"} Max Resident Memory: 785008

trufflehog 3.65.0 "scan_duration": "2.498648314s"} Max Resident Memory: 878700

trufflehog 3.66.1 "scan_duration": "1.761691232s"} Max Resident Memory: 944860

Steps to Reproduce

Use trufflehog with the time command and check the Max Resident Memory usage

cd /app/
git clone https://github.com/databricks/terraform-databricks-examples.git
cd /app/terraform-databricks-examples
/app/bin/trufflehog3.63.11 --version; /usr/bin/time --format="Max Resident Memory: %M" /app/bin/trufflehog3.63.11 filesystem . --only-verified --fail --no-update
echo "------"
/app/bin/trufflehog3.64.0 --version; /usr/bin/time --format="Max Resident Memory: %M" /app/bin/trufflehog3.64.0 filesystem . --only-verified --fail --no-update
echo "------"
/app/bin/trufflehog3.65.0 --version; /usr/bin/time --format="Max Resident Memory: %M" /app/bin/trufflehog3.65.0 filesystem . --only-verified --fail --no-update
echo "------"
/app/bin/trufflehog3.66.1 --version; /usr/bin/time --format="Max Resident Memory: %M" /app/bin/trufflehog3.66.1 filesystem . --only-verified --fail --no-update

Environment

Tested with a docker image (ubuntu:20.04)

Additional Context

My Setup: docker run -it ubuntu:20.04 /bin/bash

apt-get update
apt-get install ca-certificates curl apt-transport-https gnupg wget time git --yes 
mkdir -p /app/bin
cd /app/bin
wget -c https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.11/trufflehog_3.63.11_linux_amd64.tar.gz
wget -c https://github.com/trufflesecurity/trufflehog/releases/download/v3.64.0/trufflehog_3.64.0_linux_amd64.tar.gz
wget -c https://github.com/trufflesecurity/trufflehog/releases/download/v3.65.0/trufflehog_3.65.0_linux_amd64.tar.gz
wget -c https://github.com/trufflesecurity/trufflehog/releases/download/v3.66.1/trufflehog_3.66.1_linux_amd64.tar.gz
tar -zvxf trufflehog_3.63.11_linux_amd64.tar.gz; chmod 755 trufflehog; mv trufflehog trufflehog3.63.11
tar -zvxf trufflehog_3.64.0_linux_amd64.tar.gz; chmod 755 trufflehog; mv trufflehog trufflehog3.64.0
tar -zvxf trufflehog_3.65.0_linux_amd64.tar.gz; chmod 755 trufflehog; mv trufflehog trufflehog3.65.0
tar -zvxf trufflehog_3.66.1_linux_amd64.tar.gz; chmod 755 trufflehog; mv trufflehog trufflehog3.66.1
rgmz commented 9 months ago

The main change from 3.63.11->3.64.0 seems to be changing the regex engine. That could explain why scanning is faster but uses more memory.

https://github.com/trufflesecurity/trufflehog/releases/tag/v3.64.0

ahrav commented 9 months ago

Hey @caveman8fb, you're absolutely right about the increased memory usage – thanks so much for bringing this to our attention. @rgmz's assessment is spot-on. We aimed for better performance with the new regex library but clearly underestimated the impact on memory. We're truly sorry for any inconvenience this causes. There are some improvements in v3.67.5, and we're actively working on a more comprehensive solution to give you the option to optimize for either performance or memory use.