Open asottile opened 3 years ago
Good find! From looking at the code it appears we shouldn't write to the cache multiple times per run, so there must be some other superlinear behavior going on. Does it reproduce with your patch for #1950?
oh yeah I should have checked, yeah it does so my guess at the cache being at fault is probably incorrect
I had some spare time and after some profiling it looks like ensure_future
is being called 10x more for 3x the amount of input. The vast majority of the calls are coming from the implementation of asyncio.wait
because we're spamming it in a tight loop from https://github.com/psf/black/blob/main/src/black/__init__.py#L642
In fact, adding an await asyncio.sleep(0.1)
at the end of that loop fixes this issue and makes performance scale linearly with number of input files
I wonder if we should just use asyncio.as_completed
instead of the hand-rolled batching/waiting logic in https://github.com/psf/black/blob/main/src/black/__init__.py#L626-L659. Or maybe that would result in too much await
ing?
If so, we could bring out the big guns and use https://aioitertools.omnilib.dev/en/latest/api.html#aioitertools.asyncio.as_completed and https://aioitertools.omnilib.dev/en/latest/api.html#aioitertools.more_itertools.chunked
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
I suspect this is related to #1950 -- or at least exacerbated by it
setup
1000 files
10000 files
I expect the second command to be approximately 10x slower than the first, however it's almost 30-40x worse
A guess is black is writing the cache for every single file and should instead batch it per run
Expected behavior A clear and concise description of what you expected to happen.
Environment (please complete the following information):
Does this bug also happen on master? To answer this, you have two options:
yes
Additional context Add any other context about the problem here.