[CLI]: `api.runs()` Method Returns Duplicate and Missing Runs

kotekjedi commented 5 months ago

Describe the bug

I am encountering an issue with the Weights and Biases API where the api.runs() method returns duplicate runs and omits some other runs. What is important, is that the total number of runs returned is correct, but the runs do not correspond to the actual ones.

import wandb

# Initialize the API
api = wandb.Api()

# Project is specified by <entity/project-name>
runs = api.runs("kotekjedi/llm-rs")

# Use a set to track processed run IDs
processed_run_ids = set()

summary_list, config_list, name_list = [], [], []

for run in runs:
    run_id = run.id
    if run_id not in processed_run_ids:
        # Add the run ID to the set
        processed_run_ids.add(run_id)

        # Process the run data
        summary_list.append(run.summary._json_dict)
        config_list.append(
            {k: v for k, v in run.config.items() if not k.startswith('_')}
        )
        name_list.append(run.name)
    else:
        print(f"Duplicate run found: {run_id}")

Duplicate run found: k14f2vc7
Duplicate run found: la325tya
Duplicate run found: 4cx54f3h
Duplicate run found: 7owoqll6
Duplicate run found: lxtzne8y
Duplicate run found: kr18m6w6
Duplicate run found: 2yokxira
Duplicate run found: sf8sb853
Duplicate run found: o72vgxfy
Duplicate run found: k0vyhar6

Additional Files

No response

Environment

WandB version: '0.17.0'

OS: Windows 11

Python version: Python 3.11.5

Versions of relevant libraries:

Additional Context

No response

JoanaMarieL commented 5 months ago

Hi @kotekjedi , we tried to repro this one but did not get the same result. Are you also seeing those result in the UI, runs being duplicated?

kotekjedi commented 5 months ago

HI @JoanaMarieL, thanks for reaching out! In UI it is perfectly fine, runs are not duplicated or missing. However, when I try to download it I am not getting all of the runs - some are just missing, and some are duplicated.

PhilippBordne commented 5 months ago

Hey, I just want to confirm that I am having the same issue. In my case: total number of runs is 1200 (same as in UI), number of distinct run ids is 1163 and the number of duplicates is 37 as verified through this code snippet:

api = wandb.Api()
runs = api.runs(project_name)

distinct_run_ids = set()
duplicate_run_ids = set()

print(f"Number of runs (total): {len(runs)}")

for run in runs:
    if run.id in distinct_run_ids:
        duplicate_run_ids.add(run.id)
    else:
        distinct_run_ids.add(run.id)

print(f"Number of distinct run ids: {len(distinct_run_ids)}")
print(f"Number of duplicate run ids: {len(duplicate_run_ids)}")

With output:

Number of runs (total): 1200
Number of distinct run ids: 1163
Number of duplicate run ids: 37

I also verified that there are as many IDs missing as there are duplicate IDs in the runs list when compared to the UI. I did so by downloading the .csv from the UI and comparing its IDs with the IDs in the runs object. Just to make sure there are no duplicate IDs displayed already in the UI.

Python: 3.10.14 / wandb: 0.15.12 on MacOS 14

JoanaMarieL commented 5 months ago

Hello @kotekjedi and @PhilippBordne , thank you both for flagging this, we have the fix for this issue this coming June. As a work around please add this to your code:

runs = api.runs(
    path=<entity/project>,
    order="+created_at"
)

Hope this helps. Thanks!

JoanaMarieL commented 4 months ago

Hi @kotekjedi , our engineers already fixed the issue, could you please try and confirm if it is also working on your end. Thanks!

kotekjedi commented 4 months ago

@JoanaMarieL Thank you!

JoanaMarieL commented 4 months ago

You are most welcome @kotekjedi , marking this as resolved. Feel free to reach us out again anytime.

majoma7 commented 2 months ago

Hey,

I am sorry to report that the issue has not been resolved yet. I am facing the same issue, using version 0.17.9:

print(wandb.__version__)

api = wandb.Api()
sweep = api.sweep(f"{entity}/{project}/{sweep_id}")

run_ids = [run.id for run in sweep.runs]
run_ids_unique = list(set(run_ids))

print(len(run_ids))
print(len(run_ids_unique))

I get the prints:

0.17.9
4608
4600

Tal-Golan commented 1 month ago

This happens on 0.18.0 as well, accessing sweep.runs like @majoma7 did.

Modifying the pagination bysweep.runs.per_page = len(sweep.runs) before accessing sweep.runs seem to solve the problem, at least in my case. This might be indicative of a problem with how pages are handled inside the Runs object.

@JoanaMarieL , I suggest reopening this issue.

wandb / wandb