src-d / go-git

Project has been moved to: https://github.com/go-git/go-git
https://github.com/go-git/go-git
Apache License 2.0
4.91k stars 542 forks source link

Add() a single file is very slow #1260

Open rustyx opened 4 years ago

rustyx commented 4 years ago

Worktree.Add(file) calls Status(), which goes over the entire worktree. In addition, Add() takes a single file. So adding multiple files in a large repo therefore becomes an O(N2) operation (excruciatingly slow).

maguro commented 4 years ago

This is killing me too. I was wondering if the status could be cached in the worktree. I intend to tinker with this, but it someone has time to work on this, please pick it up.

maguro commented 4 years ago

Caching status won't work that cleanly. it seems. In reality, all it's being used for is to check if the file is unmodified or not. Rather than boiling the ocean for that simple check, I'll replace it with an on demand check.

maguro commented 4 years ago

Interesting, I don't think there are tests for adding files that are already tracked.

gdamore commented 4 years ago

This is crushing me as well. I have a repo with thousands of files. I've done a lot of work to make my workflow entirely in memory, using libasciidoc to get really fast document processing, but then this Add() performance basically steals back all the gains I achieved.

saschagrunert commented 4 years ago

Hey, we encountered that issue as well in Kubernetes release engineering. We're trying to add a single file to kubernetes/kubernetes which takes round about 5 minutes to succeed. Do you have any idea how to overcome this obstacle?