iterative / example-repos-dev

Source code and generator scripts for example DVC projects
https://dvc.org/doc
21 stars 13 forks source link

experiments: some DVC artifacts are missing when running `dvc pull` #179

Closed alex000kim closed 1 year ago

alex000kim commented 1 year ago
alex000kim commented 1 year ago

I think it has something to do with this message when running bash generate.sh:

...
+ git commit -am 'Run experiments tuning architecture. Apply best one'
[tune-architecture c842192] Run experiments tuning architecture. Apply best one
 22 files changed, 45 insertions(+), 45 deletions(-)
+ git checkout main
Switched to branch 'main'
+ dvc push -A
WARNING: Output 'data/test_data'(stage: '../../../../../dvc.yaml:data_split') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'data/train_data'(stage: '../../../../../dvc.yaml:data_split') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'models/model.pkl'(stage: '../../../../../dvc.yaml:train') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
alex000kim commented 1 year ago

even after running dvc repro; dvc push -A I am still getting the same warnings

(.venv) @alex000kim ➜ .../example-repos-dev/example-get-started-experiments/build/example-get-started-experiments (main) $ dvc repro -f; dvc push -A
Verifying data sources in stage: 'data/pool_data.dvc'                 

Running stage 'data_split':
> python src/data_split.py

Running stage 'train':
> python src/train.py
/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/.venv/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/.venv/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
epoch     train_loss  valid_loss  dice_multi  time    
0         0.320687    0.102011    0.495183    00:03                                                                                                
epoch     train_loss  valid_loss  dice_multi  time    
0         0.075994    0.081576    0.495183    00:02                                                                                                
1         0.056988    0.043638    0.880473    00:02                                                                                                
2         0.045097    0.024923    0.849681    00:02                                                                                                
3         0.038136    0.022296    0.868932    00:02                                                                                                
4         0.033316    0.022095    0.871614    00:02                                                                                                
5         0.029590    0.022463    0.874692    00:02                                                                                                
6         0.026495    0.021283    0.883413    00:02                                                                                                
7         0.024304    0.021219    0.885029    00:02                                                                                                
Updating lock file 'dvc.lock'                                                                                                                                                                                             

Running stage 'evaluate':
> python src/evaluate.py
/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/.venv/lib/python3.10/site-packages/fastai/metrics.py:372: RuntimeWarning: Mean of empty slice
  return np.nanmean(binary_dice_scores)
Updating lock file 'dvc.lock'                                                                                                                                                                                             

To track the changes with git, run:

        git add dvc.lock data/pool_data.dvc

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
WARNING: Output 'data/test_data'(stage: '../../../../../dvc.yaml:data_split') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'data/train_data'(stage: '../../../../../dvc.yaml:data_split') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'models/model.pkl'(stage: '../../../../../dvc.yaml:train') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
daavoo commented 1 year ago

This is strange. I tried dvc pull locally and fails with:

 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:   
name: models/model.pkl, md5: b4c1f4ceb876c13dbf6c0feb6ff37848

However, the file does exist in the remote:

https://s3.console.aws.amazon.com/s3/buckets/dvc-public?region=us-east-2&prefix=remote/get-started-pools/b4/&showversions=false

daavoo commented 1 year ago

I will debug on monday

shcheklein commented 1 year ago

I've done git reset origin/main --hard to make sure that it's the latest set of commits in the codescpaces instance + I've fixed permissions for S3 dvc-public and seems to be working now:

A       models/model.pkl                                                                                                                                        
1 file added and 1 file fetched
shcheklein commented 1 year ago

Also, works locally:

(.venv) √ Projects/example-get-started-experiments % dvc pull
A       data/test_data/
A       data/train_data/
A       data/pool_data/
A       models/model.pkl
4 files added and 184 files fetched
(.venv) √ Projects/example-get-started-experiments %

Closing this for now.