tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.71k stars 1.66k forks source link

Adding node and cell names for tensorboard graph #5505

Open buttercutter opened 2 years ago

buttercutter commented 2 years ago

I am trying to trace the tensorboard graph for https://github.com/promach/gdas

However, from what I can observe so far, the tensorboard graph does not really indicate user-understandable node names and cell names which makes it so difficult for tracking down the connections within the graph.

Any suggestions ?

image

bileschi commented 2 years ago

Sorry for the difficulty. I'm not sure I understand what a "Node number" is in this case. Can you tell me more? From your screenshot it looks like you are referring to the number above each small oval. Each oval is an op, and the number there is the "name" that was given by the underlying framework.

Is it possible that in the conversion pytorch is not giving the ops user-readable names?

buttercutter commented 2 years ago

Is it possible that in the conversion pytorch is not giving the ops user-readable names?

@bileschi Do you have any idea what settings I should pass into torch.utils.tensorboard for such quoted purpose ?

bileschi commented 2 years ago

unfortunately no. That library is provided by the Pytorch maintainers, not the TensorBoard team. Is it possible to explore the graph programatically before passing it to the summary writer? Are you sure the graph has op names before it is serialized to disk?

buttercutter commented 2 years ago

Is it possible to explore the graph programatically before passing it to the summary writer?

@bileschi See https://github.com/promach/gdas/blob/main/gdas.py#L321-L497

bileschi commented 2 years ago

I'm sorry, what I meant was to inspect the in-memory graph object, after its creation, and just before serialization so as to determine the status of the node names. This will tell us whether those node names were lost by torch or by TensorBoard. You may be able to do this with, e.g., pdb. You may also be able to get help, if you need it, from the torch community. At this point what it looks like is that TensorBoard is giving back what is put into it. I.e., a graph without node names.

buttercutter commented 2 years ago

@bileschi However torch community is close to non-existent

bileschi commented 2 years ago

Try here? https://discuss.pytorch.org/

Looks like PyTorch deves from Facebook AI are active in the community.

On Fri, Jan 21, 2022 at 3:18 AM promach @.***> wrote:

@bileschi https://github.com/bileschi However torch community https://github.com/torch/torch7#need-help is close to non-existent

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorboard/issues/5505#issuecomment-1018280317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSTR6DJ2IENMIW73ZHOTUXEJDVANCNFSM5L7BJSEA . You are receiving this because you were mentioned.Message ID: @.***>

-- Stan Bileschi Ph.D. | SWE | @.*** | 617-230-8081

buttercutter commented 2 years ago

@bileschi both pytorch forum and pytorch github issue seem to show that debugging tensorboard is not really their top priority for pytorch, that is at least what I think.

Maybe I have to debug this code using bare eye instead of tensorboard visualization tool in this case then.

buttercutter commented 2 years ago

@bileschi Debugging using bare eye is not really working for such a graph with lots of branches.

I really need to get this tensorboard support of node names up for pytorch.

Any advice or suggestions ?

bileschi commented 2 years ago

I'm sorry but I don't have any great suggestions. It sounds like this is a problem with the PyTorch implmentation of the summary writer not including the names of the nodes. TensorBoard only shows the names that are given to it. It's PyTorch's responsibility to name the nodes.

I see from your discussion here that you are using a custom function : https://discuss.pytorch.org/t/tensorboard-issue-with-self-defined-forward-function/140628?u=promach , is there something you need to do here to ensure the names are associated with the nodes of the computational graph?

Maybe you can get a PyTorch expert to help on StackOverflow?

buttercutter commented 2 years ago

is there something you need to do here to ensure the names are associated with the nodes of the computational graph?

@bileschi This is something that I am not sure at all given that all the nodes are objects under hierarchical classes.

and the use of these nodes inside for loops might have confused the PyTorch implmentation of the summary writer ?

buttercutter commented 2 years ago

@bileschi I tried from torch.utils.tensorboard import SummaryWriter and from tensorboardX import SummaryWriter , but both are still giving some non-user-readable nodes and cells names for the output graph.

I suspect the issue might be due to the multiple nested for loops inside the forward function.

Note: I have removed for epoch in range(NUM_EPOCHS): in the forward function during testing and debugging this issue.

whisper0055 commented 1 year ago

I have the same problem now and I also suspect that it is caused by for loops. Have you solve the problem by removing for loops?