iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

import/get: optimize LFS prefetching #10414

Closed sisp closed 1 month ago

sisp commented 1 month ago

I've significantly optimized LFS prefetching performance. With this change, include in scmrepo.git.lfs.fetch() is now a single-item list with either a Unix filename pattern like <subdir>/** or a file path. scmrepo.git.lfs.fetch._collect_objects() still enumerates all files in the repo with fs.find("/") (that's quite fast even with many files) but scmrepo.git.lfs.fetch._filter_paths() only matches those files against a single pattern.

Partially fixes https://github.com/iterative/scmrepo/issues/338.

/cc @shcheklein