iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.95k stars 1.19k forks source link

exp show: cli table fails to parse numeric columns as expected #9396

Open dberenbaum opened 1 year ago

dberenbaum commented 1 year ago

Bug Report

Description

The default table format generated by dvc exp show fails to render version numbers like 2.50.0 as columns.

Reproduce

Clone https://github.com/iterative/analytics/tree/clean (note the clean branch) and rundvc exp show. None of the version number columns show (they do get rendered in other output formats like--md/--json`).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.55.1.dev20+g2a04263f0
------------------------------------
Platform: Python 3.10.10 on macOS-13.2.1-arm64-arm-64bit
Subprojects:
        dvc_data = 0.47.2
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.2
Supports:
        azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
        gdrive (pydrive2 = 1.15.3),
        gs (gcsfs = 2022.11.0),
        hdfs (fsspec = 2022.11.0, pyarrow = 11.0.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2022.11.0, boto3 = 1.24.59),
        ssh (sshfs = 2023.4.1),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8),
        webhdfs (fsspec = 2022.11.0)
Config:
        Global: /Users/dave/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local, s3
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/7f5d6fecd9ab2f28d386765a5eacb03d
pmrowla commented 1 year ago

I think the issue is just that the table for that repo ends up being wider than the maximum allowed width for a console table in rich (even when using the pager). The numeric columns end up being the farthest to the right, so they get cut off from the table.

I am able to see the numeric columns If I do

dvc exp show --drop Cmd
dberenbaum commented 1 year ago

Ah, sorry for the noise, guess I jumped to the wrong conclusion here. Thanks for taking a look @pmrowla!

Why does it still show the params and files columns at the end even when a bunch of the metrics columns get dropped?

pmrowla commented 1 year ago

@dberenbaum it collapses from right to left but will try to preserve at least one column of each type (metrics/params/data) whenever possible

dberenbaum commented 1 year ago

Looks like this has been somewhat handled on our end, but we capped at a max width of 1024 here:

https://github.com/iterative/dvc/blob/main/dvc/ui/table.py#L14

Confirmed that it works as expected by increasing that value. @skshetry What's the reason for MAX_WIDTH=1024?

dberenbaum commented 1 year ago

ping @skshetry

pmrowla commented 1 year ago

We can probably make this a bigger number, but iirc rich requires us to set some maximum value otherwise it gets confused trying to compute our table widths. I think we probably just picked 1024 because it seemed like a reasonable default at the time.

The main point here is that exp show is naive and is not really meant to be able to show tables with this many columns in it, and it would be better for users to use vscode for this instead

dberenbaum commented 1 year ago

With --no-pager, I get a table like this:

 ────────────────
     …    …    …
 ────────────────
     …    …    -
     …    …    -
     …    …    -
     …    …    -
     …    …    -

Not sure why we aren't showing at least one column since it would fit.

With the pager, it looks like this:

 ───────────────────────────────────────────────────────────────────────────────────────>
  Experiment                 Created        results/usage.json:monthly_cli_invocations  >
 ───────────────────────────────────────────────────────────────────────────────────────>
  workspace                  -                                                19188779  >
  main                       May 02, 2023                                      1919379  >
  ip                         May 02, 2023                                     19377657  >
  clean                      May 01, 2023                                     19188779  >
  76d1341                    May 01, 2023                                      1919379  >
  ├── aff7171 [ruddy-palp]   May 02, 2023                                      3531123  >

There's a lot of empty whitespace to the right in the table when using the pager, so I don't think it's maximizing the space. We also probably could shorten/drop the filenames, which are taking up most of the space.