iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.66k stars 1.17k forks source link

`dvc exp show --json` produces error that disables VS Code Extension from tracking experiments #9565

Closed TeamEpimicro closed 1 year ago

TeamEpimicro commented 1 year ago

Bug Report

dvc exp show --json : ERROR: unexpected error - first argument must be callable or None

Description

I am currently using the VS Code extension with version v0.9.6 and dvc with version v2.58.2 (see dvc doctor output below). My .dvc/config file is the following:

[core]
    analytics = false
    remote = local
[cache]
    type = "reflink,hardlink,copy"
['remote "local"']
    url = some/local/url

My VS Code Extension is showing "No experiment to display" in the Experiments tab of the left sidebar. Capture d’écran 2023-06-08 110351 In addition, the Experiment tab in the Setup view of the DVC extension says: "Your project contains no data"

Capture d’écran 2023-06-05 175000

As a result, I am unable to use the VS Code Extension.

However I already ran of couple of experiments through dvc exp run and pushed the associated dvc.lock files to my git repo, which I can successfully track through the CLI with dvc exp show --rev master :

$ dvc exp show --rev master
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Experiment | Created | models\model_dvc_v1.5.0\rfor\eval\live\metrics.json:avg_precision.train | models\model_dvc_v1.5.0\rfor\eval\live\metrics.json:avg_precision.test | models\model_dvc_v1.5.0\rfor\eval\live\metrics.json:roc_auc.train | models\model_dvc_v1.5.0\rfor\eval\live\metrics.json:roc_auc.test | models\model_dvc_v1.5.0\rfor\eval\live\metrics.json:true_positive_rate.tr
|------------+---------+-------------------------------------------------------------------------+------------------------------------------------------------------------+-------------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------------------
| workspace  | -       |                                                                       1 |                                                                      1 |                                                                 0 |                                                                0 |
| master     | -       |                                                                       - |                                                                      - |                                                                 - |                                                                - |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I found an error in the VS Code Developper Console :

exp show --rev master -n 3 --json failed with ERROR: unexpected error - first argument must be callable or None

The same error is displayed when running dvc exp show --json in the CLI :

$ dvc exp show --json
ERROR: unexpected error - first argument must be callable or None

Here is the same command ran with --verbose :

$ dvc exp show --json --verbose
2023-06-08 14:12:00,455 DEBUG: v2.58.2 (pip), CPython 3.9.10 on Windows-10-10.0.19044-SP0
2023-06-08 14:12:00,455 DEBUG: command: C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\Scripts\dvc exp show --json --verbose
2023-06-08 14:12:02,121 DEBUG: Removing 'C:\Users\user\Documents\Projets\priam\.dvc\tmp\exps\cache\23\f69486c525c1fc2e124f1204e8e2308a4ecb70'
2023-06-08 14:12:02,121 DEBUG: first argument must be callable or None
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\collect.py", line 71, in collect_rev
    cache.put(data, force=True)
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\cache.py", line 47, in put
    self.odb.add_bytes(rev, exp.as_bytes())
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\serialize.py", line 111, in as_bytes
    return _ISOEncoder().encode(self.dumpd()).encode("utf-8")
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\serialize.py", line 108, in dumpd
    return asdict(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1075, in asdict
    return _asdict_inner(obj, dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1082, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1113, in <genexpr>
    _asdict_inner(v, dict_factory))
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
TypeError: first argument must be callable or None

2023-06-08 14:12:02,393 DEBUG: Removing 'C:\Users\user\Documents\Projets\priam\.dvc\tmp\exps\cache\41\09bc516df665602e670e50fdcf15cc4fcc698c'
2023-06-08 14:12:02,401 DEBUG: first argument must be callable or None
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\collect.py", line 71, in collect_rev
    cache.put(data, force=True)
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\cache.py", line 47, in put
    self.odb.add_bytes(rev, exp.as_bytes())
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\serialize.py", line 111, in as_bytes
    return _ISOEncoder().encode(self.dumpd()).encode("utf-8")
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\serialize.py", line 108, in dumpd
    return asdict(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1075, in asdict
    return _asdict_inner(obj, dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1082, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1113, in <genexpr>
    _asdict_inner(v, dict_factory))
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
TypeError: first argument must be callable or None

2023-06-08 14:12:02,403 ERROR: unexpected error - first argument must be callable or None
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\cli\__init__.py", line 210, in main
    ret = cmd.do_run()
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\cli\command.py", line 26, in do_run
    return self.run()
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\commands\experiments\show.py", line 197, in run
    ui.write_json([exp.dumpd() for exp in exps], default=_format_json)
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\commands\experiments\show.py", line 197, in <listcomp>
    ui.write_json([exp.dumpd() for exp in exps], default=_format_json)
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\lib\site-packages\dvc\repo\experiments\serialize.py", line 184, in dumpd
    return asdict(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1075, in asdict
    return _asdict_inner(obj, dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1082, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1082, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1113, in <genexpr>
    _asdict_inner(v, dict_factory))
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1112, in _asdict_inner
    return type(obj)((_asdict_inner(k, dict_factory),
TypeError: first argument must be callable or None

2023-06-08 14:12:02,463 DEBUG: link type reflink is not available ([Errno 129] no more link types left to try out)
2023-06-08 14:12:02,463 DEBUG: Removing 'C:\Users\user\Documents\Projets\.CKkgodSpGJHLcb7YVCcjuL.tmp'
2023-06-08 14:12:02,463 DEBUG: Removing 'C:\Users\user\Documents\Projets\.CKkgodSpGJHLcb7YVCcjuL.tmp'
2023-06-08 14:12:02,463 DEBUG: link type symlink is not available ([WinError 1314] Le client ne dispose pas dâ–’un privilâ–’ge nâ–’cessaire: 'C:/Users/user/Documents/Projets/priam/.dvc/cache/.KPmWaZQ8W9QmaDqNSqcJAa.tmp' -> 'C:/Users/user/Documents/Projets/.CKkgodSpGJHLcb7YVCcjuL.tmp')
2023-06-08 14:12:02,463 DEBUG: Removing 'C:\Users\user\Documents\Projets\.CKkgodSpGJHLcb7YVCcjuL.tmp'
2023-06-08 14:12:02,463 DEBUG: Removing 'C:\Users\user\Documents\Projets\priam\.dvc\cache\.KPmWaZQ8W9QmaDqNSqcJAa.tmp'
2023-06-08 14:12:02,463 DEBUG: Version info for developers:
DVC version: 2.58.2 (pip)
-------------------------
Platform: Python 3.9.10 on Windows-10-10.0.19044-SP0
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.5.3
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: C:\Users\user\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: local
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\dd38c746d16228f621190e9e46db919d

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-06-08 14:12:02,472 DEBUG: Analytics is disabled.

Reproduce

  1. git init
  2. dvc init
  3. dvc exp run
  4. dvc push
  5. git add dvc.lock
  6. git commit -m "some commit name"
  7. git push
  8. dvc exp show --json

Expected

No error is expected.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.58.2 (pip)
-------------------------
Platform: Python 3.9.10 on Windows-10-10.0.19044-SP0
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.5.3
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: C:\Users\user\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: local
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\dd38c746d16228f621190e9e46db919d

I am using poetry for package and environment management.

Output of pip check:

$ pip check
No broken requirements found.

Output of poetry check:

$ poetry check
All set!

Output of poetry env info:

$ poetry env info

Virtualenv
Python:         3.9.10
Implementation: CPython
Path:           C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9
Executable:     C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\priam-xwE_6ZV3-py3.9\Scripts\python.exe
Valid:          True

System
Platform:   win32
OS:         nt
Python:     3.9.10
Path:       C:\Users\user\AppData\Local\Programs\Python\Python39
Executable: C:\Users\user\AppData\Local\Programs\Python\Python39\python.exe

Output of poetry show dvc:

$ poetry show dvc
 name         : dvc
 version      : 2.58.2
 description  : Git for data scientists - manage your code and data together

dependencies
 - colorama >=0.3.9
 - configobj >=5.0.6
 - distro >=1.3
 - dpath >=2.1.0,<3
 - dvc-data >=0.51.0,<0.52
 - dvc-http >=2.29.0
 - dvc-render >=0.3.1,<1
 - dvc-studio-client >=0.9.2,<1
 - dvc-task >=0.2.1,<1
 - flatten-dict >=0.4.1,<1
 - flufl.lock >=5
 - funcy >=1.14
 - grandalf >=0.7,<1
 - hydra-core >=1.1
 - iterative-telemetry >=0.0.7
 - networkx >=2.5
 - packaging >=19
 - pathspec >=0.10.3
 - platformdirs >=3.1.1,<4
 - psutil >=5.8
 - pydot >=1.2.4
 - pygtrie >=2.3.2
 - pyparsing >=2.4.7
 - requests >=2.22
 - rich >=12
 - ruamel.yaml >=0.17.11
 - scmrepo >=1.0.0,<2
 - shortuuid >=0.5
 - shtab >=1.3.4,<2
 - tabulate >=0.8.7
 - tomlkit >=0.11.1
 - tqdm >=4.63.1,<5
 - voluptuous >=0.11.7
 - zc.lockfile >=1.2.1

required by
 - dvclive >=2.58.0,<3

Additionnal Information

I am instantiating dvclive in my code with the following :

with Live(dvc_live_path, report=None, dvcyaml=False) as live:

And tracking some metrics with both live.summary() and live.log_metric():

for metric in all_metrics:
    if not live.summary.get(metric):
        live.summary[metric] = {}
        live.summary[metric][split] = {}
    curr_metric = all_metrics[metric]
    live.summary[metric][split] = curr_metric['value']
    live.log_metric(f'{metric}/{split}', curr_metric['value'])
pmrowla commented 1 year ago

This is actually a core python bug which has been fixed in some releases but has not been backported to a 3.9 release yet.

We have a workaround for it here: https://github.com/iterative/dvc/blob/ed664143b48f5372773075ad5093b93ffdfd755b/dvc/repo/experiments/serialize.py#L59-L71

but I'm guessing this issue means we need to make the defaultdict->dict conversion recursive

Danila89 commented 1 year ago

I'm facing the same issue, it happens even with python 3.10. Are there any workarounds at the moment?

dberenbaum commented 1 year ago

@pmrowla Is it expected that it would still happen with 3.10?

shcheklein commented 1 year ago

@Danila89 hey, could you please also share a bit more information - dvc version output, logs, etc. Thanks.

Danila89 commented 1 year ago

@shcheklein seems that the problem depends on what exactly the experiments look like. I'm exploring DVC capabilities, I delete and run a lot of experiments. At some point in time I encountered the issue. Vscode extension did not work, dvc exp show --json failed with both Python 3.10.9 and Python 3.9.6. I reproduced it with dvc 2.58.2 and dvc 3.0.0. But at the moment (I guess after deletion of some experiments) the problem is gone. If I will encounter it next time I will share the details and try to find the way to reproduce.

dberenbaum commented 1 year ago

@mattseddon reported that he can reproduce it with python 3.10 in https://github.com/iterative/dvc/issues/9588#issuecomment-1594504602. Looking into it.

Update: looks like a regression from https://github.com/iterative/dvc/commit/ec090c5680c490f34339edf3beaf9b40a90397a9.

skshetry commented 1 year ago

@pmrowla Is it expected that it would still happen with 3.10?

This will happen in Python <3.12.

dberenbaum commented 1 year ago

@skshetry I think it's actually Python<3.11.0, right?

dberenbaum commented 1 year ago

AFAICT the default dicts are only nested one level deep, so I put a quick fix PR in https://github.com/iterative/dvc/pull/9619. @skshetry @pmrowla Any issues with this quick fix?

@TeamEpimicro If you want to try, you could do pip install git+https://github.com/iterative/dvc.git@exp-serialize-defaultdict and check if it fixes the problem.

skshetry commented 1 year ago

@skshetry I think it's actually Python<3.11.0, right?

No, it was fixed only in 3.12. See https://github.com/python/cpython/pull/32056.

i can still repro in 3.11 using:

from collections import defaultdict
from dataclasses import asdict, dataclass, field

@dataclass
class Klass:
    d: dict[str, list[int]] = field(default_factory=dict)

d = defaultdict(list, {"lst": [1, 2, 3]})
inst = Klass(d=d)
print(asdict(inst))
dberenbaum commented 1 year ago

Closing as fixed by #9619.

@TeamEpimicro @Danila89 Please follow up if you still have issues.