iterative / scmrepo

SCM wrapper and fsspec filesystem for Git for use in DVC.
https://dvc.org
Apache License 2.0
21 stars 14 forks source link

exp run: can't stash changes (nothing to stash) on Win #211

Closed shcheklein closed 1 year ago

shcheklein commented 1 year ago

Bug Report

Description

dvc exp run fails on Windows (conda) with an error from pygit2: unexpected error - 'cannot stash changes - there is nothing to stash.'

Reproduce

I was using: Win 11, VS Code or miniconda terminal, https://github.com/shcheklein/hackathon repo w/o any substantial changes.

dvc exp run fails with this error from the very start.

Output of dvc doctor:

DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.10.8 on Windows-10-10.0.22621-SP0
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git
dberenbaum commented 1 year ago

I'm able to reproduce, and I'm adding detailed output below. Is it expected that stashing should use pygit2?

(base) C:\Users\Administrator\hackathon>dvc exp run -v
2023-01-16 15:11:35,806 ERROR: unexpected error - 'cannot stash changes - there is nothing to stash.'
------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\cli\__init__.py", line 185, in main
    ret = cmd.do_run()
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\cli\command.py", line 22, in do_run
    return self.run()
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\commands\experiments\run.py", line 32, in run
    results = self.repo.experiments.run(
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\__init__.py", line 469, in run
    return run(self.repo, *args, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\run.py", line 59, in run
    return repo.experiments.reproduce_one(
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\__init__.py", line 112, in reproduce_one
    self.queue_one(exp_queue, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\__init__.py", line 149, in queue_one
    return self.new(
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\__init__.py", line 264, in new
    return queue.put(*args, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\queue\workspace.py", line 39, in put
    return self._stash_exp(*args, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\dvc\repo\experiments\queue\base.py", line 328, in _stash_exp
    with self.scm.stash_workspace() as workspace:
  File "C:\Users\Administrator\miniconda3\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\scmrepo\git\__init__.py", line 432, in stash_workspace
    rev = self.stash.push(**kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\scmrepo\git\stash.py", line 40, in push
    rev, reset = self.scm._stash_push(  # pylint: disable=protected-access
  File "C:\Users\Administrator\miniconda3\lib\site-packages\scmrepo\git\__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\scmrepo\git\backend\pygit2.py", line 475, in _stash_push
    oid = self.repo.stash(
  File "C:\Users\Administrator\miniconda3\lib\site-packages\pygit2\repository.py", line 1076, in stash
    check_error(err)
  File "C:\Users\Administrator\miniconda3\lib\site-packages\pygit2\errors.py", line 56, in check_error
    raise KeyError(message)
KeyError: 'cannot stash changes - there is nothing to stash.'
------------------------------------------------------------
2023-01-16 15:11:36,119 DEBUG: link type reflink is not available ([Errno 129] no more link types left to try out)
2023-01-16 15:11:36,135 DEBUG: Removing 'C:\Users\Administrator\.LG2JaTdBgwXro6rxgiAMuN.tmp'
2023-01-16 15:11:36,135 DEBUG: Removing 'C:\Users\Administrator\.LG2JaTdBgwXro6rxgiAMuN.tmp'
2023-01-16 15:11:36,150 DEBUG: Removing 'C:\Users\Administrator\.LG2JaTdBgwXro6rxgiAMuN.tmp'
2023-01-16 15:11:36,150 DEBUG: Removing 'C:\Users\Administrator\hackathon\.dvc\cache\.Pw4ScxcLH7a2UnFereeVju.tmp'
2023-01-16 15:11:36,150 DEBUG: Version info for developers:
DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.10.8 on Windows-10-10.0.20348-SP0
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: NTFS on C:\
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-01-16 15:11:36,166 DEBUG: Analytics is enabled.
2023-01-16 15:11:36,416 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\2\\tmp0n7ihw4f']'
2023-01-16 15:11:36,447 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\2\\tmp0n7ihw4f']'
pmrowla commented 1 year ago

If there's no changes to stash it shouldn't be hit at all

https://github.com/iterative/scmrepo/blob/339fdba37383888cd4e5d20dc729efdb509cd919/src/scmrepo/git/stash.py#L35

Need to investigate why the is_dirty check is not returning False

pmrowla commented 1 year ago

@shcheklein @dberenbaum can you both please run git config --show-origin core.autocrlf in your git repo? Also, how did you install git in windows? (Did you do a standalone git installation or was it bundled through the vscode installer?)

The issue is probably that dulwich can't find your system gitconfig, which causes dulwich to report dirty status due to line-ending changes (files are likely checked out with windows CRLF line endings in your windows workspace, but they are git-committed with unix LF ending). The issue is that there is no official default system gitconfig location on windows (on linux it's always /etc/gitconfig). Dulwich searches the windows registry and your system PATH for any git installations, but depending on how you install git on windows the installer does not always add git to the system-wide PATH and it does not always set the registry entries.

dberenbaum commented 1 year ago

Also, how did you install git in windows?

I downloaded the top link from https://git-scm.com/download/win (tried to be as vanilla as possible). I created a Win ec2 and used Windows Remote Desktop for this, so I'll need to find time to do that again to get the git config.

shcheklein commented 1 year ago

git config --show-origin core.autocrlf

file:C:/Users/ivan/.gitconfig true

Same, I installed via the regular win installer.

Here are my PATH and Git binary locations:

(base) PS C:\Users\ivan> Get-Command git
CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Application     git.exe                                            2.39.0.2   C:\Program Files\Git\cmd\git.exe
(base) PS C:\Users\ivan> $env:PATH
C:\Users\ivan\miniconda3;C:\Users\ivan\miniconda3\Library\mingw-w64\bin;C:\Users\ivan\miniconda3\Library\usr\bin;C:\Users\ivan\miniconda3\Library\bin;C:\Users\ivan\miniconda3\Scripts;C:\Users\ivan\miniconda3\bin;C:\Users\ivan\miniconda3\condabin;C:\Program Files\Parallels\Parallels Tools\Applications;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Users\ivan\AppData\Local\Microsoft\WindowsApps;.;C:\Users\ivan\AppData\Local\Programs\Microsoft VS Code\bin
shcheklein commented 1 year ago

@pmrowla I can run some simple python script like "import dulwich ..." if there is a way to check for the config.

dberenbaum commented 1 year ago

Is it expected that stashing should use pygit2?

@pmrowla Does this make sense? From what I can tell, _stash_push is called during dvc exp run even on my mac in a clean workspace, but there it uses dulwich, and in windows it looks to be using pygit2.

pmrowla commented 1 year ago

@pmrowla I can run some simple python script like "import dulwich ..." if there is a way to check for the config.

from dulwich.config import StackedConfig

cfg = StackedConfig(StackedConfig.default_backends())
print("paths", [f.path for f in cfg.backends])
try:
    print("core.autocrlf", cfg.get("core", "autocrlf"))
except KeyError:
    print("core.autocrlf unset")

Is it expected that stashing should use pygit2?

It can be called from any backend depending on what other git calls we have made. When we have to change a backends due to some call that is only supported in one particular backend, we continue preferring that backend until we reach the next call that requires a different specific backend.

shcheklein commented 1 year ago

Here it is:

>>> from dulwich.config import StackedConfig
>>>
>>> cfg = StackedConfig(StackedConfig.default_backends())
>>> print("paths", [f.path for f in cfg.backends])
paths ['C:\\Users\\ivan/.gitconfig', 'C:\\Program Files\\Git\\etc\\gitconfig']
>>> try:
...     print("core.autocrlf", cfg.get("core", "autocrlf"))
... except KeyError:
...     print("core.autocrlf unset")
...
core.autocrlf b'true'

It seems it can detect it. I've tried to change it to false to double check.

pmrowla commented 1 year ago

@shcheklein can you also try

from scmrepo.git import Git

Git(".").dulwich.status(untracked_files="no")
shcheklein commented 1 year ago

Sure...

>>> from scmrepo.git import Git
>>>
>>> Git(".").dulwich.status(untracked_files="no")
({}, ['evaluation/scalars/acc_0.tsv', 'evaluation/scalars/best_acc.tsv', 'evaluation/scalars/best_loss.tsv', 'evaluation/scalars/best_test_acc.tsv', 'evaluation/scalars/best_test_loss.tsv', 'evaluation/scalars/eval/accuracy.tsv', 'evaluation/scalars/eval/loss.tsv', 'evaluation/scalars/loss_0.tsv', 'evaluation/scalars/train/accuracy.tsv', 'evaluation/scalars/train/loss.tsv'], [])

Not sure where does it take eval files, btw. What is the format of the output for this? git status is empty in my case at the moment.

pmrowla commented 1 year ago

It looks like the bug is due to the .tsv files in the repo being git-committed with windows style CRLF line endings. The issue is that dulwich applies autocrlf to those files and wants them to be git-committed with linux LF-only, so dulwich reports them as dirty. It looks like CLI git does not apply the autocrlf filtering in this case (where the file was explicitly git-committed with CRLF line endings).

In this situation where you have a git repo that you are using on both *nix and windows and have files in your repo with mixed line endings, it would be better to have an explicit .gitattributes file in the repo root with

*.tsv eol=crlf

but dulwich hasn't implemented the gitattributes eol filter yet so it wouldn't actually help here (we can send a PR for this) https://github.com/jelmer/dulwich/blob/f40a60968957e3982eee437f04b3e515bb2f5493/dulwich/line_ending.py#L179-L181

pmrowla commented 1 year ago

Looks like we generate tsv files with CRLF because dvclive uses the default python csv writer which defaults to excel dialect (https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters).

@dberenbaum @daavoo is this actually required? Excel does require csv files to have lines separated by CRLF, but IMO it would be better for us to generate them with os.linesep so that we don't end up with mixed line ending cases like this one.

Alternatively, if we do need to explicitly generate them with CRLF due to common tools requiring it, dvclive should probably also be generating/modifying .gitattributes to set *.tsv text eol=crlf (or else document that users need to set the .gitattributes properly if they are working in a mixed *nix+windows environment)

shcheklein commented 1 year ago

Okay, I see that those files were produced by dvclive at some point on Linux or Mac, they do have CRLF for whatever reason (I didn't use Win), but I think dvclive is fixed now.

I'll try to do an commit and see if that fixes it ...

Anyways, should we fix it on our end for these cases (not p0 I guess)? We should be consistent with Git CLI in this case ...

pmrowla commented 1 year ago

It looks like at some point dvclive was changed so that it's supposed to only generate LF endings so that might be a dvclive bug https://github.com/iterative/dvclive/blob/7fe6cdfded013a411f10ac4153cc1f8a002c0543/src/dvclive/plots/metric.py#L40-L42

I think it should still probably be os.linesep and not \n but that's a minor issue

pmrowla commented 1 year ago

If dvclive does generate CRLF it will lead to inconsistent behavior depending on whether the user runs on nix or windows (regardless of the dulwich bug).

If we generate a file with CRLF on windows, it will get git committed with LF endings on windows in the default scenario (where core.autocrlf=true). So when a user on linux checks it out, they will get LF endings (and cannot open it in excel) and when a user on windows checks it out, they will get CRLF endings (due to core.autocrlf=true conversion on checkout)

If we generate a file with CRLF on linux, it will get git committed with CRLF line endings (since git does not do any line ending conversion on linux). This currently leads to the dulwich bug on windows, but either way this leads to inconsistent behavior depending on whether the user runs dvclive on windows or they run it on linux.


The proper way to handle this in dvclive would be to either

shcheklein commented 1 year ago

I think it's fixed and doesn't produce CRLFs anymore. I think we can stick with LF since we control these files.

It still means that dulwich should be fixed, right?

pmrowla commented 1 year ago

Generating only LF will still lead to inconsistencies (when anyone does git checkout on windows they will get CRLF line endings in the checked out tsv, even though they will get LF when they actually run dvclive on windows)

And yes there's still the dulwich bug but it's probably a nontrivial fix since it's not actually clear/documented what CLI git is doing in this case. I'm guessing CLI git generates both the normalized (converted line ending) and non-normalized blobs and then does the comparison against what's in the index, but if we have to do that for every file that might make dulwich really slow on windows.

It would probably be better for us to just implement the .gitattributes support for dulwich and document that using gitattributes is the correct way to handle mixed line ending usage in git repos (instead of relying on whatever workarounds have been continually added in CLI git over time)

shcheklein commented 1 year ago

Can it be that it doesn't even try to calculate anything since files are not even touched?

One thing to think about here is how we calculate md5s, specifically it would affect if dep/out is changed or not - now we do apply dos2unix, but later we want to get rid of it. Can't think of any issues at the moment though.

dberenbaum commented 1 year ago

Thanks for debugging @pmrowla and @shcheklein! Dropping the priority.

I opened https://github.com/iterative/dvclive/pull/427 to use os.linesep as @pmrowla suggested. Even if we add eol filter support in dulwich, it seems better to handle this automatically than ask users to add a filter to their .gitattributes file. Are there any issues with this approach?

vitalwarley commented 1 year ago

Hey, guys... this also happens on Linux. Check it out

 λ dvc exp run -s benchmark-csharp@validation
ERROR: unexpected error - 'cannot stash changes - there is nothing to stash.'

where

λ dvc doctor
DVC version: 2.43.2 (pip)
-------------------------
Platform: Python 3.9.15 on Linux-6.1.8-arch1-1-x86_64-with-glibc2.36
Subprojects:
    dvc_data = 0.37.3
    dvc_objects = 0.19.1
    dvc_render = 0.1.0
    dvc_task = 0.1.11
    dvclive = 1.3.1
    scmrepo = 0.1.7
Supports:
    azure (adlfs = 2022.11.2, knack = 0.9.0, azure-identity = 1.10.0),
    gdrive (pydrive2 = 1.15.0),
    gs (gcsfs = 2022.11.0),
    hdfs (fsspec = 2022.11.0, pyarrow = 8.0.0),
    http (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
    https (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
    oss (ossfs = 2021.8.0),
    s3 (s3fs = 2022.11.0, boto3 = 1.24.59),
    ssh (sshfs = 2023.1.0),
    webdav (webdav4 = 0.9.7),
    webdavs (webdav4 = 0.9.7),
    webhdfs (fsspec = 2022.11.0)
Cache types: reflink, hardlink, symlink
Cache directory: btrfs on /dev/nvme0n1p3
Caches: local
Remotes: gdrive, ssh, local
Workspace directory: btrfs on /dev/nvme0n1p3
Repo: dvc, git

and

λ git status
On branch container/experiments
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
    modified:   src/csharp (new commits, modified content)
    modified:   src/python (new commits, modified content, untracked content)

Also, I found similar problems in iterative/dvc#7403 and iterative/dvc#5855. The workaround is to just have one modified file in the workspace. (e.g., .gitignore).

pmrowla commented 1 year ago

@vitalwarley are you using git submodules?

vitalwarley commented 1 year ago

@vitalwarley are you using git submodules?

Yes. I am sorry I forgot to mention that; those modified dirs (src/csharp, src/python) are both submodules.

pmrowla commented 1 year ago

exp run lacking full support for submodules is a known issue (separate from this one), the problem you encountered is the same as iterative/dvc#5855, so please keep any submodule related discussion there.

This issue is specifically for a separate windows only bug (relating to CRLF and .gitattributes handling)

kmosunoff commented 1 year ago

Hello guys, this problem remains on python package (dvc==2.43.4) on Windows. Hope you'll pay your attention on that ;)

dberenbaum commented 1 year ago

@kmosunoff Could you give more details on the problem you have? Are you using DVCLive?

legendof-selda commented 1 year ago

Hey I am also facing the same issue as well. I ensured that all files are ending with LF. DVC keeps saving the dvc.lock and other files with CRLF instead but I have manually saved them and I still face this issue. out.txt

pmrowla commented 1 year ago

discord user reports possible merge conflict error (instead of the nothing to stash error) in some cases after #214 https://discord.com/channels/485586884165107732/1085840450218377216/1090263098994204753

pmrowla commented 1 year ago

Issue was that the repo contained .gitattributes EOL settings with files that had not been normalized to match the gitattributes settings

The workaround here is to do

git add . --renormalize
git commit

to make sure the files in the repo match the gitattributes settings before using dvc exp run (this should only need to be done once)