Open tillahoffmann opened 2 years ago
Thanks.
If my memory is correct I made it a set() instead of list for performance reasons.
If a task has hundreds of file_dep
it would be inefficient to check in
using a list
.
Not sure this still applies... have you checked it?
Please also add an entry on CHANGES, and doc changes (not sure if relevant or where).
I've had a look running the tests on master
and the feature branch. There doesn't seem to be much in it, but I appreciate that the number of deps is likely low in the tests. See below for results.
$ (master) repeat 5 time pytest > /dev/null
pytest -q > /dev/null 2.88s user 0.84s system 52% cpu 7.054 total
pytest -q > /dev/null 2.80s user 0.82s system 52% cpu 6.949 total
pytest -q > /dev/null 2.79s user 0.82s system 52% cpu 6.905 total
pytest -q > /dev/null 2.79s user 0.80s system 52% cpu 6.872 total
pytest -q > /dev/null 2.85s user 0.83s system 51% cpu 7.089 total
$ (file_dep-list) repeat 5 time pytest > /dev/null
pytest -q > /dev/null 2.80s user 0.82s system 52% cpu 6.925 total
pytest -q > /dev/null 2.77s user 0.80s system 52% cpu 6.855 total
pytest -q > /dev/null 2.76s user 0.80s system 51% cpu 6.853 total
pytest -q > /dev/null 2.79s user 0.81s system 52% cpu 6.898 total
pytest -q > /dev/null 2.77s user 0.80s system 52% cpu 6.857 total
test_dependency.py
$ (master) repeat 5 time pytest tests/test_dependency.py > /dev/null
pytest tests/test_dependency.py > /dev/null 0.35s user 0.17s system 14% cpu 3.553 total
pytest tests/test_dependency.py > /dev/null 0.34s user 0.17s system 14% cpu 3.541 total
pytest tests/test_dependency.py > /dev/null 0.34s user 0.17s system 14% cpu 3.546 total
pytest tests/test_dependency.py > /dev/null 0.35s user 0.17s system 14% cpu 3.556 total
pytest tests/test_dependency.py > /dev/null 0.35s user 0.17s system 14% cpu 3.557 total
$ (file_dep-list) repeat 5 time pytest tests/test_dependency.py > /dev/null
pytest tests/test_dependency.py > /dev/null 0.35s user 0.17s system 14% cpu 3.569 total
pytest tests/test_dependency.py > /dev/null 0.34s user 0.16s system 14% cpu 3.536 total
pytest tests/test_dependency.py > /dev/null 0.32s user 0.15s system 13% cpu 3.531 total
pytest tests/test_dependency.py > /dev/null 0.34s user 0.17s system 14% cpu 3.540 total
pytest tests/test_dependency.py > /dev/null 0.34s user 0.16s system 14% cpu 3.533 total
I've had another look at performance, and I think this change would not negatively affect performance. I use setup.py
to generate 5,000 files and then consider the time it takes to execute doit
for a task that depends on all 5,000 files.
# setup.py
import os
os.makedirs("inputs", exist_ok=True)
for i in range(5000):
with open(f"inputs/{i}.txt", "w") as fp:
fp.write(str(i) * i)
# dodo.py
import os
os.makedirs("inputs", exist_ok=True)
for i in range(5000):
with open(f"inputs/{i}.txt", "w") as fp:
fp.write(str(i) * i)
Here's the evaluation.
# git:file_dep-list
$ time (repeat 100 doit)
...
8.15s user 3.18s system 94% cpu 11.962 total
# git:master
$ time (repeat 100 doit)
...
8.19s user 3.19s system 94% cpu 12.044 total
Hi @schettino72, just wanted to follow up whether this is something you're interested in merging.
@tillahoffmann sorry for delay. I am planning to dedicate some time for open-source in the last week of this month.
This PR fixes #254 by changing
file_dep
to a list rather than using a set, ensuring the order of dependencies is preserved.