Performance bogs down with normal web use.

mitch0s commented 2 years ago

G'day,

I tried using the proxy as a normal HTTPs proxy for normal web-browsing. It seems like it struggles with a backlog of requests and does things sequentially.

I'm not sure if it's built for this kind of purpose, but it's what I intend on using it for so any help in getting it to run slightly smoother would be of great help!

Cheers,

Mitch

synchronizing commented 2 years ago

Hey Mitch,

Out of curiosity, which version of mitm are you using?

Performance upgrade is what I'm hoping to tackle next after adding a test suite to the project. The main bottleneck is in new_ssl_context function, which is called on every request. This function creates the necessary ssl.SSLContext that is used to do the TLS/SSL handshake with the client with bogus credentials.

https://github.com/synchronizing/mitm/blob/e564815eb8394230b891a2a6a98282162993de35/mitm/crypto.py#L211-L222

Due to limitations on the Python Standard Library's ssl module we must save & load the cert and key to disk on every request. This means that every request that comes in we are saving & loading from disk, adding literal seconds to every page load with 100+ requests.

https://github.com/synchronizing/mitm/blob/e564815eb8394230b891a2a6a98282162993de35/mitm/crypto.py#L228-L240

This is a disaster that hasn't been fixed since a patch was first created for cpython back in 2013. See this thread if you want to feel depressed. I attempted to mitigate this issue by using lru_cache, but haven't gotten around to testing performance just yet. The solution isn't super clear from here, and I'm still trying to figure out what to do. Ultimately, I would like to move away from ssl.SSLContext and instead use OpenSSL.SSL.Context (which supports loading cert/key from memory), but unfortunately asyncio.get_event_loop().start_tls does not work with OpenSSL, only the ssl module.

I'll keep this issue open and will update here if I find any solution.

mitch0s commented 2 years ago

Regarding the version of mitm, I am currently using v1.3.0

Yeah, I haven't looked too far into the mitm stack, but If I have some free time over this coming up holidays, I'll take a look into how mitmproxy handles connections/requests and see if that can be of any help :)

Cheers!

synchronizing commented 2 years ago

Sounds good! I'll drop any performance updates here as well.

synchronizing commented 2 years ago

Added #19 to mitigate bottle neck issues. v1.3 was/is improperly using lru_cache to do the caching, but v1.4 will fix it. I also recommend to turn off logging, as this will improve performance.

Browsed and used WhatsApp for about an hour with #19 enabled and had zero issues (YouTube, Reddit, YCombinator, Github, StackOverflow, etc. among other things). Things seem to work as expected. Give v1.4 a try and let me know! If you want to increase cache size for even more performance (at the cost of memory usage) you can use the following:

from mitm import MITM, CertificateAuthority, middleware, protocol, crypto
from pathlib import Path

# Updates the maximum size of the LRU cache.
crypto.LRU_MAX_SIZE = 2048 # Defaults to 1024. 

# Loads the CA certificate.
path = Path("/Users/felipefaria/Desktop")
certificate_authority = CertificateAuthority.init(path=path)

# Starts the MITM server.
mitm = MITM(
    host="127.0.0.1",
    port=8888,
    protocols=[protocol.HTTP],
    middlewares=[],
    certificate_authority=certificate_authority,
)
mitm.run()

You don't need to change LRU_MAX_SIZE, but you have the option if you would like. See docs here for more info.

Closing this for the meantime.

synchronizing / mitm

Performance bogs down with normal web use. #18