nedbat / coveragepy

The code coverage tool for Python
https://coverage.readthedocs.io
Apache License 2.0
2.89k stars 419 forks source link

Memory leak when generating reports #1791

Closed 0x78f1935 closed 1 month ago

0x78f1935 commented 1 month ago

Describe the bug Please see this issue

To Reproduce Generate any reports, json / html, doesn't matter.

Expected behavior Reports will be generated, without eating away all my memory.

Additional context This issue has been going on for a couple of months now and contains way more information and things that have been tried to solve this issue.

So far we have worked out that the issue occures spefically with coverage report.

Any further tips?

nedbat commented 1 month ago

Any further tips?

As I asked on that issue, I need clear complete instructions for how to reproduce the issue. No one has mentioned what version of coverage.py they are using. I don't know how to make the problem appear so that I can debug it. If you know of a way for me to do that, please provide the instructions.

0x78f1935 commented 1 month ago

@nedbat , Before closing the issue I posted your request.

For the instructions, I've enclosed all I knew in here, here and here. All information can be found in the previous issue.

nedbat commented 1 month ago

I'm sorry, that still doesn't give me a way to reproduce the issue. I'm available under contract if you need me to sign an NDA to see your private repo.

devdanzin commented 1 month ago

Sounds like #1785 to me: affects Python 3.12, eats up a lot of memory.

@0x78f1935, does your code happen to have very long lines in it?

0x78f1935 commented 1 month ago

I'm sorry, that still doesn't give me a way to reproduce the issue. I'm available under contract if you need me to sign an NDA to see your private repo.

I'm sorry, I cannot do that.

Altho @sebastian-muthwill, mentioned he was able to reproduce the issue I had encountered.

Sounds like #1785 to me: affects Python 3.12, eats up a lot of memory.

@0x78f1935, does your code happen to have very long lines in it?

No, my unit-tests do not utilize a large mock data set. That being said, I also utilize Flake8 across my codebase.

sebastian-muthwill commented 1 month ago

@nedbat I take the discussion up here. I tried yesterday to reproduce the issue with a clean Django project with the example from the Django docs and everything went well.

My project is much bigger and is failing.

I try to reproduce the issue and come back.

0x78f1935 commented 1 month ago

My project is much bigger and is failing.

I can confirm that this happens to me with big project(s) too.

First, I thought maybe my project was the issue, but now that @sebastian-muthwill was able to reproduce the issue, I'm more confident we are onto something. I would love to be of help; unfortunately, I'm unable to share my codebase. Just like Sebastian's codebase, mine is big.

I took some time to take my codebase apart and basically started removing all components I didn't want to share so I could make a repository that reproduces the issue. After removing the frontend and Docker environments, I was left with my tests and my backend.

I trimmed my tests to a total of one test and was still able to reproduce the memory leak. The moment I started touching endpoints in my backend (which I don't want to share), things started to work, which is why I'm a bit uncertain. The strange part is that it doesn't really matter which component I turn on or off in any order. When they are turned on, there is a memory leak; when they are turned off, everything is okay (though there is no data to report), which might explain why that approach appears successful. This makes it difficult to pinpoint a specific component.

I thought to myself, that isn't very helpful. Surely, I can find an existing project of someone and transform that into a reproducible environment, but so far no luck with reproduction. I tried various things; I even tried my own project.toml file. No luck.

I hate to say it, but the best workaround I have right now, if you really need your coverage, downgrade to Python 3.11 or skip generating reports all together.

devdanzin commented 1 month ago

There is a memory issue that only affects Python 3.12, due to a bug in that Python version: https://github.com/python/cpython/issues/119118. I believe this is the issue affecting you.

There's a test you can run to check whether this is the issue we think it is. In coveragepy/coverage/phystokens.py, change the line:

 return list(tokenize.generate_tokens(readline))

To:

 return tokenize.generate_tokens(readline)

It should lessen the memory impact, if it's the same issue.

sebastian-muthwill commented 1 month ago

@devdanzin Thanks for pointing that out.

@nedbat I just tested it and can confirm that the issue is gone when settings the return as @devdanzin described.

#return list(tokenize.generate_tokens(readline))   # <-- memory leakage happening
    return tokenize.generate_tokens(readline)         # <-- no issue

Since it is not reproduceable with a fresh installation, I could imagine it has maybe something to do with the amount of tests run?

devdanzin commented 1 month ago

So far, we had only seen this issue manifest on files with very long (many thousand characters) lines, where memory ballooned very fast and resulted in a MemoryError. For those, removing the call to list() avoided the memory issue, but still resulted in very slow (many minutes versus a couple of seconds) execution, are you seeing a significant slow down between 3.11 and 3.12?

Since it comes from a Python 3.12 bug, it seemed OK to keep the list() call, as it was still unusable without it and a fix is being worked on.

However, given that your case indicates the list() call is detrimental even for non-pathological files, I believe removing it is the right thing to do. @nedbat?

sebastian-muthwill commented 1 month ago

Without having it measured, I would say it was not longer than in 3.11. However I have only 53 tests so the impact is maybe not that big.

devdanzin commented 1 month ago

Thinking about it, the use of @functools.lru_cache(maxsize=100) is probably what causes the leak: the lists use a lot more memory on 3.12 compared to 3.11 (because each Token keeps a new copy of the text of the line it's in), but should be garbage collected after being used. The cache would keep them around, inflating memory usage proportionally to the number/length of files in the report.

Which brings the question: is the removal of the list() call correct given presence of the cache? Wouldn't we be caching an exhausted iterator in that case?

devdanzin commented 1 month ago

Confirmed that just removing the list() call causes many test failures. Removing both that call and the cache makes tests pass again.

Given that tokenizing seems much faster on 3.12 compared to 3.9 (and uses a lot more memory), maybe we could define coverage.phystokens.generate_tokens to be cached and call list() on versions below 3.12, with no cache and no list() call for 3.12 and above?

0x78f1935 commented 1 month ago

There's a test you can run to check whether this is the issue we think it is. In coveragepy/coverage/phystokens.py, change the line:

 return list(tokenize.generate_tokens(readline))

To:

 return tokenize.generate_tokens(readline)

It should lessen the memory impact, if it's the same issue.

I've tested out your theory.

@functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
    """A cached version of `tokenize.generate_tokens`.

    When reporting, coverage.py tokenizes files twice, once to find the
    structure of the file, and once to syntax-color it.  Tokenizing is
    expensive, and easily cached.

    Unfortunately, the HTML report code tokenizes all the files the first time
    before then tokenizing them a second time, so we cache many.  Ideally we'd
    rearrange the code to tokenize each file twice before moving onto the next.
    """
    readline = io.StringIO(text).readline
    # return list(tokenize.generate_tokens(readline))
    return tokenize.generate_tokens(readline)

The behaviour of the end changed drastically.

I think your issue might be related, yes!

My current run command looks like pytest --exitfirst -vs --junitxml htmlcov/pytest.xml --cov --cov-report html and consists of 42 tests.

Since it is not reproduceable with a fresh installation, I could imagine it has maybe something to do with the amount of tests run?

I don't think that is the case, while I was stripping down to a reproducable version I was able to walk against this issue with just one test.

are you seeing a significant slow down between 3.11 and 3.12?

I would say 3.11 is significant faster in comparison. Which is weird, cause wasn't 3.12 suppose to be 40% faster? With 3.11 you can get yourself coffee, with 3.12 you can get yourself coffee and get yourself a french baguette in French, and return safely home. It takes very long.

Thinking about it, the use of @functools.lru_cache(maxsize=100) is probably what causes the leak:

Lets test it:

# @functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
    """A cached version of `tokenize.generate_tokens`.

    When reporting, coverage.py tokenizes files twice, once to find the
    structure of the file, and once to syntax-color it.  Tokenizing is
    expensive, and easily cached.

    Unfortunately, the HTML report code tokenizes all the files the first time
    before then tokenizing them a second time, so we cache many.  Ideally we'd
    rearrange the code to tokenize each file twice before moving onto the next.
    """
    readline = io.StringIO(text).readline
    return list(tokenize.generate_tokens(readline))
    # return tokenize.generate_tokens(readline)

This causes the same issue to arrise. Memoryleak.

I'll try without list foor good meassure.

# @functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
    """A cached version of `tokenize.generate_tokens`.

    When reporting, coverage.py tokenizes files twice, once to find the
    structure of the file, and once to syntax-color it.  Tokenizing is
    expensive, and easily cached.

    Unfortunately, the HTML report code tokenizes all the files the first time
    before then tokenizing them a second time, so we cache many.  Ideally we'd
    rearrange the code to tokenize each file twice before moving onto the next.
    """
    readline = io.StringIO(text).readline
    # return list(tokenize.generate_tokens(readline))
    return tokenize.generate_tokens(readline)

This seems to work, but it doesn't increase the speed. Just like the first attempt, it's stuck on 100%, memory is idle, takes a couple of minutes before html files start appearing. But it takes ages for it to complete.

@devdanzin You beat me too it :P , Thank you for your research!

0x78f1935 commented 1 month ago

I think I've found my (big) class which might cause the memory leak. It's a serializer..: image The test seems to be stuck at 100%, but at the moment it was rounding up. I saw the cities file be generated as html file. The size of that file exceeds 8MB. Guess how many statements that class houses? ~24914~

Is this something I should solve my side, (which I obviously should), but is this something coverage is going to account for?

Before cities class: image

After reducing the class:

image image

5 Whole minutes to generate 1 (big) file. That is a long time.

Meanwhile generating typescript classess based on those serializers works within seconds.

nedbat commented 1 month ago

Very odd that tokenize makes new copies of the line for every token. That seems unnecessary. I wrote a CPython issue about it: https://github.com/python/cpython/issues/119654

nedbat commented 1 month ago

I did some quick experiments with both many-line files and very-long-line files, and found that on both 3.11 and 3.12 it was faster to not cache and to not use list(), so I've removed them in commit b666f3af275aa499e3e9811bfb09a081dded7513.

nedbat commented 1 month ago

This is now released as part of coverage 7.5.3.