tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
962 stars 188 forks source link

Poor regex performance #939

Closed JamieMagee closed 3 years ago

JamieMagee commented 3 years ago

Describe the bug When running tern against postgres:latest approximately 85% of the runtime is spent in update_master_list and approximately 60% of that function (and 50% of total runtime) is spent compiling and running regexes in prop_names.

I've attached profiler logs for more information

To Reproduce Steps to reproduce the behavior:

  1. tern --driver fuse --clear-cache report --docker-image postgres:latest
  2. Wait...

Profiler logs generated using

export PYTHONPATH=/workspaces/tern/
pyinstrument -r html -o index.html tern/__main__.py --driver fuse --clear-cache report --docker-image postgres:latest

Expected behavior Better performance

Environment you are running Tern on Enter all that apply

Please attach files if they exist

nishakm commented 3 years ago

@JamieMagee are you working on a fix?

JamieMagee commented 3 years ago

@nishakm Unfortunately I haven't got time right now to properly investigate and test a fix, but I was looking at this as a starting point:

prop_decorators = r'^__|^_'
for key in obj.__dict__.keys():
    # remove private and protected decorator characters if any
    priv_name = '_' + obj.__class__.__name__
    prop_name = re.sub(priv_name, '', key)
    prop_name = re.sub(prop_decorators, '', prop_name, 1)
    yield key, prop_name

to

for key in obj.__dict__.keys():
    prop_name = key.split('__')[-1]
    yield key, prop_name
nishakm commented 3 years ago

@JamieMagee no worries. I can take a look.

nishakm commented 3 years ago

Looks to me like to_dict() is called way too many times with update_master_list. Let me try looking at that first.