jelmer / dulwich

Pure-Python Git implementation
https://www.dulwich.io/
Other
2.06k stars 395 forks source link

porcelain.status takes much longer than git status for many untracked directories #835

Open ravwojdyla opened 3 years ago

ravwojdyla commented 3 years ago

porcelain.status takes much longer than git status for many untracked directories.

Reproduction:

# prep the directories
for i in {1..40}; do
  for j in {1..1000}; do
    mkdir -p tmp/$i/$j
    touch tmp/$i/$j/t
  done
done
# validate the number of directories/files is relatively large
> find tmp | wc -l
   80041
> time git status --porcelain
?? tmp/
git st --porcelain  0.01s user 0.01s system 83% cpu 0.019 total
from dulwich import porcelain
from dulwich.repo import Repo
import time
t0 = time.time(); _ = porcelain.status(Repo(".")); print(f"Took {time.time() - t0} seconds")
Took 14.289646625518799 seconds
jelmer commented 3 years ago

porcelain.status() hasn't been optimized for performance yet - it currently makes a few passes over the working directory, and could probably be simplified to just a single pass.