wandb / wandb

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
https://wandb.ai
MIT License
9.14k stars 671 forks source link

[App]: Images logged to files, but not visible in workspace #4830

Closed dyunis closed 1 year ago

dyunis commented 1 year ago

Current Behavior

When logging images, I am not able to view them in the workspace, even though the files are available.

Expected Behavior

An image panel should pop up automatically (the same code was working previously), and if not, I should be able to add a panel for images.

Steps To Reproduce

On the python side, I'm logging images with a line like

wandb.log({'images': wandb.Image(image_grid)}, step=iteration, commit=True)

for which I encounter the problems described.

See the difference between https://wandb.ai/dyunis/deq_diffusion/runs/zv2phncl?workspace=user-dyunis (working) and https://wandb.ai/dyunis/deq_diffusion/runs/c85nrhug?workspace=user-dyunis (not working)

Screenshots

Here are the files in the web app for the working run:

working_imgs

Here is the workspace with a "Media" section:

working_panel

And here is the ability to add a new panel with images:

working_panel_add

Now for a run that does not seem to be displaying correctly, here are the same screenshots, for the media:

broken_imgs

For the workspace, now missing a "Media" section by default:

broken_panels

And the inability to add a panel with images to the workspace:

broken_panel_add

Environment

OS: macOS Monterey 12.6.2

Browsers: Safari 15.6.1, Chrome 109.0.5414.87

Version: wandb 0.13.9

Additional Context

https://github.com/wandb/wandb/issues/3936 seems like the same issue, but for video, but there hasn't been any activity on that for many months.

Thanks for this wonderful software, it's really invaluable in my day-to-day!

lesliewandb commented 1 year ago

Hi @dyunis I'm so happy that you have liked our product so far! This is a known issue that we have previously ticketed. I have increased the priority on this and I'll let you know if there are any updates here.

dyunis commented 1 year ago

Hi @lesliewandb, thanks for keeping me in the loop. Is there anything you can say more about the cause of the issue so that I might adjust something on my end?

lesliewandb commented 1 year ago

I don't have any information about what is causing this, but I talked to the engineer in charge of this ticket and he says that he's currently looking into the cause for this

dyunis commented 1 year ago

Thanks a lot!

sepsamavi commented 1 year ago

I am experiencing the same issue, except with videos. My tables are also not visible. But all files are in the "Files" section.

Another anomaly is that some intermediate values that I am logging prior to the video/table are being listed in the "summary" section. I wonder if this is a part of the problem.

MRzNone commented 1 year ago

I am having the same issue with HTML. Thank you very much for wandb. Solving this issue would help a lot.

drscotthawley commented 1 year ago

Just a note that I've been interacting with Luis @ wandb support on this same problem. Will email him back and link to this issue.

To clarify: I'm not getting Tables, Images, Audio, or Plotly graphs anywhere except within the Files tab. Used to be they would appear on the run's main screen. Not sure what changed.

drscotthawley commented 1 year ago

Tried downgrading from v0.13.9 towandb==0.12.21 which was the last installation I had where media was appearing as expected, but this did not change the behavior: still no Media.

More info: Python 3.8.10, Ubuntu.

$ pip list | grep torch 
pytorch-lightning        1.9.0
torch                    1.13.1
torchaudio               0.13.1
torchmetrics             0.11.0
torchvision              0.14.1
appdirs==1.4.4
docker-pycreds==0.4.0
GitPython==3.1.30
pathtools==0.1.2
protobuf==3.20.1
psutil==5.9.4
Pillow==9.4.0
requests==2.28.1
sentry-sdk==1.12.1
setproctitle==1.3.2
tornado==6.2
typing_extensions==4.4.0

Also, this does not seem to be related to data throttling. Besides the public server, I've also tried with a private corporate-sponsored wandb server which has no rate limits: Same problem: no Media section.

exalate-issue-sync[bot] commented 1 year ago

WandB Internal User commented: MRzNone commented: Is there any follow up to this? I am having the same issue with HTML

exalate-issue-sync[bot] commented 1 year ago

Leslie commented: Thank you so much for your patience here. Current update is that we have a closer idea of where the bug is

kptkin commented 1 year ago

Hi @dyunis @sepsamavi @MRzNone @drscotthawley

I think I know what is happening, if you look in your logs (<run-dir>/logs/debug-internal.log), you might find that we have a warning that says something like:

WARNING HandlerThread:89979 [handler.py:handle_request_partial_history():550] Step 0 < 1001. Dropping entry: ...

we only allow the steps to be monotonically increasing, hence if you are using something like a resume and you explicitly specify the step when logging, something like this:

wandb.log({'images': wandb.Image(image_grid)}, step=iteration, commit=True)

if the step (in the example above iteration) is smaller than the last step you resumed from, we are dropping this data from our history log, hence it doesn't appear in the run page. We have a separate logic to save the media files and uploading them to our remote app, so you might see the files regardless.

If this is indeed the issue, the solution is to either not specify the steps or make sure the steps are monotonically increasing.

In case this not the issue, if you could provide a repro of how you run your script it would be helpful for me to keep trying to debug this issue.

Looking to hear back from you :)

drscotthawley commented 1 year ago

@kptkin Not applicable to my case. Not doing any resuming. All one run. My steps are monotonically increasing. No "Dropping" in the debug log file.

kptkin commented 1 year ago

@drscotthawley could you provide a repro? it is a very interesting case, i can't currently understand how that would happen. Also a link to a run where you see this behavior might give further clues to what's happening.

Thanks and sorry for the inconvenience

drscotthawley commented 1 year ago

@kptkin Please refer to the 14-message email thread with support@wandb Re: "[Weights & Biases] [Frontend] Media not appearing in Workspace", opened February 1, with responses from Luis and Frida. All my history, repos, logs, records, run links, etc. are documented there.

Earlier today Frida suggested a workaround involving the use of tables, which I have yet to try today. Will do so in a little while.

kptkin commented 1 year ago

@drscotthawley sorry, I didn't realized you were already talking to support. I will ask them for more information.

drscotthawley commented 1 year ago

Partial success: Current workaround suggested by Frida to encapsulate audio and images inside tables and use wandb_logger.log_table(), works for me for everything but my Plotly figures, which are not JSON-serializable.

kptkin commented 1 year ago

@drscotthawley That's good to hear! If you don't need to interact with the plots, maybe you can convert them into images in the meantime?

Also, looking into your issue it seems to be related to our PL integration. We are going to look into it more closely and I will update this issue with our findings and fixes!

dyunis commented 1 year ago

@kptkin Thanks for getting back, like @drscotthawley I don't think this case applies to me (I support resuming in my code, but I am not resuming from anything when I see the bug, and it seems to come and go at random), but I would not like to share the code publicly. Is there a convenient way for me to do so?

kptkin commented 1 year ago

Hi @dyunis

  1. In case you want to share this information in private you could email support at: support@wandb.com.
  2. In the link to the bad run you provided I see that the run is being resumed, if that wasn't your intent, we would be happy to help you figure out what went wrong.
jbloomAus commented 1 year ago

f the step (in the example above iteration) is smaller than the last step you resumed from, we are dropping this data from our history log, hence it doesn't appear in the run page. We have a separate logic to save the media files and uploading them to our remote app, so you might see the files regardless.

If this is indeed the issue, the solution is to either not specify the steps or make sure the steps are monotonically increasing.

In case this not the issue, if you could provide a repro of how you run your script it would be helpful for me to keep trying to debug this issue.

Looking to hear back from you :)

@kptkin This is the case for me however changing the step to a monotonically increasing value solved the problem! Thankyou!

kptkin commented 1 year ago

@jbloomAus Glad to hear it worked for you! Sorry for the inconvenience! We are working on making the indications more obvious and planning to add more fixes in this area in one of the up-coming releases!

drscotthawley commented 1 year ago

Got a fix for me today after talking to Frida for an hour. In my case, I'd been passing in a step= kwarg to the Pytorch Lightning logger, which had been fine for several months, but apparently recently that was found to produce a conflict (but wasn't throwing any error messages or warnings). As soon as I removed that kwarg, i.e. going from

trainer.logger.experiment.log(log_dict, step=trainer.global_step)

to

trainer.logger.experiment.log(log_dict)

...fixed everything; media now appears reliably when using the second version of the code above. The first version still works sometimes, intermittently, but Frida recommended removing the step altogether. The logger still logs the step elsewhere, so media still have steps attached to them. 👍

dyunis commented 1 year ago

@kptkin Sorry to come back to this late, but thanks for your diligence! I checked and it seems I was resuming a run, and in addition, in some cases where I failed to see output, steps were not monotonically increasing. Happy to have a better understanding now 🙂

kptkin commented 1 year ago

@dyunis Thanks for the the reply. Glad we could help.

kptkin commented 1 year ago

I will be closing this issue for now, but if someone still have issue and or question, please feel free to re-open this issue again!

aryamohan23 commented 1 year ago

Hello, I am facing a similar issue, but the solution suggested by @drscotthawley does nothing for me.

I am running a sweep, and my sweep crashes even though all my images are logged in files. The run also shows as 'crashed'.

phtu-cs commented 1 year ago

I am facing the same issue when I resume an experiment. The images are successfully uploaded to the Wandb server, but they are not visible in the workspace.

kptkin commented 1 year ago

I am facing the same issue when I resume an experiment. The images are successfully uploaded to the Wandb server, but they are not visible in the workspace.

@phtu-cs could you provide additional information, such as debug log, reproduction script, description of how you are doing resume?

MBakirWB commented 1 year ago

Hi @phtu-cs , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know.

junzhin commented 7 months ago

This problem still exists

kptkin commented 7 months ago

@junzhin would you mind providing more information to help us debug this issue? ideally reproduction script and/or debug logs would be very helpful. Also feel free to open a new issue and link to this one, for discoverability.