iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.96k stars 1.19k forks source link

fetch/push/status: not handling config from other revisions #9754

Open efiop opened 1 year ago

efiop commented 1 year ago

When branches have wildly different remote setups, those configs are not taken into account during fetch/push/status -c --all-tags/branches/etc

Example:

#!/bin/bash

set -e
set -x

rm -rf mytest
mkdir mytest
cd mytest

mkdir remote1
mkdir remote2
remote1="$(pwd)/remote1"
remote2="$(pwd)/remote2"

mkdir repo
cd repo
git init
dvc init
git commit -m "init"
git branch branch1
git branch branch2

git checkout branch1
echo foo > foo
dvc add foo
dvc remote add -d myremote1 $remote1
dvc push
git add .gitignore foo.dvc .dvc/config
git commit -m "foo"

git checkout branch2
echo bar > bar
dvc add bar
dvc remote add -d myremote2 $remote2
dvc push
git add .gitignore bar.dvc .dvc/config
git commit -m "bar"

git checkout main
rm -rf .dvc/cache
dvc fetch --all-branches
tree .dvc/cache  # will show 0 files

Studio uses real git checkout to collect objects and has been doing that for years as a workaround, but I couldn't find an issue in dvc yet.

To fix this we should make config part of Index(same as stages, outs, etc are, don't confuse with DataIndex) and use it to build Index.data. This is the easiest to do in dvc fetch because it is using Index.data already, but might require temporary workarounds for push/status -c like manually triggering config reloading in brancher or something.

dberenbaum commented 1 year ago

When branches have wildly different remote setups

Do we respect any of the per-revision remote config? Or we always use the remote config from the workspace?

efiop commented 1 year ago

@dberenbaum Always using the remote config from the workspace. Not respecting per-revision remote configs at all 🙁 Now fixed for fetch though.