pytask-dev / pytask

pytask is a workflow management system that facilitates reproducible data analyses.
https://pytask-dev.readthedocs.io/en/stable
Other
114 stars 10 forks source link

BUG: Inconsistent number of collected tasks when (sub)paths occur multiple times #624

Closed timmens closed 4 months ago

timmens commented 4 months ago

When setting multiple pytask paths with common subdirectories, the tasks in those subdirectories are counted twice in the collection. I have checked, and they are not being executed twice. (This was originally discovered by @ChristianZimpelmann.)

Code Sample, a copy-pastable example

pytask version: 0.5.0

$ cat pyproject.toml
[tool.pytask.ini_options]
paths = [
  ".",
  ".",
]
$ cat task_a.py
import random
from pathlib import Path

def task_a():
    Path(f"{random.randint(0, 10)}.txt").write_text("test")

Problem description

The first row says Collected 2 tasks, but in the Summary, it says 1 Collected tasks.

image

Expected Output

I believe there should be the same number of collected tasks. Additionally, if the verbosity level is high, one could consider printing a warning.

tobiasraabe commented 4 months ago

Hi @timmens and @ChristianZimpelmann, nice catch! Thanks a lot!

I'm not sure where it fails right now, but we should generate all the paths we are collecting and then take a set of them.

If you want to start a PR, you are welcome. Otherwise, I will tackle it eventually. The error is not blocking anything right?

timmens commented 4 months ago

No, it's not blocking anything; it's rather cosmetic.

I can try to tackle this but it could take a few weeks until I can work on this.