hfaran / slack-export-viewer

A Slack Export archive viewer that allows you to easily view and share your Slack team's export
https://pypi.python.org/pypi/slack-export-viewer
MIT License
950 stars 193 forks source link

Improve performance of Reader._build_thread(). #142

Closed thomasvandoren closed 3 years ago

thomasvandoren commented 3 years ago

I ran into issues with getting a relatively large archive (~100MB compressed zip) to load. I did some performance analysis with line profiler, and discovered most of the time was in _build_thread() doing trivial string compares... lots and lots of them.

Refactor _build_thread() to iterate through channel_data[channel_name] twice. This is an improvement over the N squared (at least) iteration that was previously used, which resulted in more iterations that slowed down performance. Eliminating the nested loop over channel_data[channel_name] is done by pre-computing a dictionary (O(1) lookup) that can be used to look up messages by user and timestamp.

Two other small improvements are:

I also tested the impact of json parsing by implementing an optional use of the orjson library when it's available. It did not have any measurable impact on performance in tests with the archives I have, so I left it out.

These are the perf test results run on an intel macbook pro with the ~100MB compressed zip slack export:

Perf test Tests Run Average time (seconds) Stdev time (seconds)
Optimized _build_thread 10 24.5 1.3
orjson + Optimized _build_thread 10 24.8 0.8
Original 1 900.0 0 (only ran once and it didn't complete after 15 minutes)

The command I used:

time slack-export-viewer --no-browser -p 8088 --test -z Test\ Slack\ export\ Test.zip
thomasvandoren commented 3 years ago

To verify this change, I checked the html output to ensure the messages/threads/etc were still getting built. It looked ok, however I would not say I did a rigorous job of ensuring correctness...

What's the best way to ensure the behavior hasn't changed? Thanks!

hfaran commented 3 years ago

Thanks for @thomasvandoren. As for testing, if you did a visual inspection and everything looked good, that should be good enough given the changes you're making. (Test coverage is pretty bare).