tillahoffmann commented 2 years ago

This PR fixes #254 by changing file_dep to a list rather than using a set, ensuring the order of dependencies is preserved.

schettino72 commented 2 years ago

Thanks.

If my memory is correct I made it a set() instead of list for performance reasons. If a task has hundreds of file_dep it would be inefficient to check in using a list. Not sure this still applies... have you checked it?

Please also add an entry on CHANGES, and doc changes (not sure if relevant or where).

tillahoffmann commented 2 years ago

I've had a look running the tests on master and the feature branch. There doesn't seem to be much in it, but I appreciate that the number of deps is likely low in the tests. See below for results.

Test duration results

Executing all tests

$ (master) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.88s user 0.84s system 52% cpu 7.054 total
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.949 total
pytest -q > /dev/null  2.79s user 0.82s system 52% cpu 6.905 total
pytest -q > /dev/null  2.79s user 0.80s system 52% cpu 6.872 total
pytest -q > /dev/null  2.85s user 0.83s system 51% cpu 7.089 total

$ (file_dep-list) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.925 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.855 total
pytest -q > /dev/null  2.76s user 0.80s system 51% cpu 6.853 total
pytest -q > /dev/null  2.79s user 0.81s system 52% cpu 6.898 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.857 total

Executing only tests in `test_dependency.py`

$ (master) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.553 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.541 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.546 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.556 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.557 total

$ (file_dep-list) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.569 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.536 total
pytest tests/test_dependency.py > /dev/null  0.32s user 0.15s system 13% cpu 3.531 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.540 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.533 total

tillahoffmann commented 2 years ago

I've had another look at performance, and I think this change would not negatively affect performance. I use setup.py to generate 5,000 files and then consider the time it takes to execute doit for a task that depends on all 5,000 files.

# setup.py
import os

os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)

# dodo.py
import os

os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)

Here's the evaluation.

# git:file_dep-list
$ time (repeat 100 doit)
...
8.15s user 3.18s system 94% cpu 11.962 total

# git:master
$ time (repeat 100 doit)
...
8.19s user 3.19s system 94% cpu 12.044 total

tillahoffmann commented 2 years ago

Hi @schettino72, just wanted to follow up whether this is something you're interested in merging.

schettino72 commented 2 years ago

@tillahoffmann sorry for delay. I am planning to dedicate some time for open-source in the last week of this month.

pydoit / doit

Use `list` for `file_dep` (fixes #254). #430

Test duration results

Executing all tests

Executing only tests in `test_dependency.py`

pydoit / doit

Use `list` for `file_dep` (fixes #254). #430

Test duration results

Executing all tests

Executing only tests in test_dependency.py

Executing only tests in `test_dependency.py`