Open buttercutter opened 2 years ago
Sorry for the difficulty. I'm not sure I understand what a "Node number" is in this case. Can you tell me more? From your screenshot it looks like you are referring to the number above each small oval. Each oval is an op, and the number there is the "name" that was given by the underlying framework.
Is it possible that in the conversion pytorch is not giving the ops user-readable names?
Is it possible that in the conversion pytorch is not giving the ops user-readable names?
@bileschi Do you have any idea what settings I should pass into torch.utils.tensorboard for such quoted purpose ?
unfortunately no. That library is provided by the Pytorch maintainers, not the TensorBoard team. Is it possible to explore the graph programatically before passing it to the summary writer? Are you sure the graph has op names before it is serialized to disk?
Is it possible to explore the graph programatically before passing it to the summary writer?
@bileschi See https://github.com/promach/gdas/blob/main/gdas.py#L321-L497
I'm sorry, what I meant was to inspect the in-memory graph object, after its creation, and just before serialization so as to determine the status of the node names. This will tell us whether those node names were lost by torch or by TensorBoard. You may be able to do this with, e.g., pdb. You may also be able to get help, if you need it, from the torch community. At this point what it looks like is that TensorBoard is giving back what is put into it. I.e., a graph without node names.
@bileschi However torch community is close to non-existent
Try here? https://discuss.pytorch.org/
Looks like PyTorch deves from Facebook AI are active in the community.
On Fri, Jan 21, 2022 at 3:18 AM promach @.***> wrote:
@bileschi https://github.com/bileschi However torch community https://github.com/torch/torch7#need-help is close to non-existent
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorboard/issues/5505#issuecomment-1018280317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSTR6DJ2IENMIW73ZHOTUXEJDVANCNFSM5L7BJSEA . You are receiving this because you were mentioned.Message ID: @.***>
-- Stan Bileschi Ph.D. | SWE | @.*** | 617-230-8081
@bileschi both pytorch forum and pytorch github issue seem to show that debugging tensorboard is not really their top priority for pytorch, that is at least what I think.
Maybe I have to debug this code using bare eye instead of tensorboard visualization tool in this case then.
@bileschi Debugging using bare eye is not really working for such a graph with lots of branches.
I really need to get this tensorboard support of node names up for pytorch.
Any advice or suggestions ?
I'm sorry but I don't have any great suggestions. It sounds like this is a problem with the PyTorch implmentation of the summary writer not including the names of the nodes. TensorBoard only shows the names that are given to it. It's PyTorch's responsibility to name the nodes.
I see from your discussion here that you are using a custom function : https://discuss.pytorch.org/t/tensorboard-issue-with-self-defined-forward-function/140628?u=promach , is there something you need to do here to ensure the names are associated with the nodes of the computational graph?
Maybe you can get a PyTorch expert to help on StackOverflow?
is there something you need to do here to ensure the names are associated with the nodes of the computational graph?
@bileschi This is something that I am not sure at all given that all the nodes are objects under hierarchical classes.
and the use of these nodes inside for
loops might have confused the PyTorch implmentation of the summary writer ?
@bileschi I tried from torch.utils.tensorboard import SummaryWriter and from tensorboardX import SummaryWriter , but both are still giving some non-user-readable nodes and cells names for the output graph.
I suspect the issue might be due to the multiple nested for
loops inside the forward function.
Note: I have removed for epoch in range(NUM_EPOCHS):
in the forward function during testing and debugging this issue.
I have the same problem now and I also suspect that it is caused by for loops. Have you solve the problem by removing for loops?
I am trying to trace the tensorboard graph for https://github.com/promach/gdas
However, from what I can observe so far, the tensorboard graph does not really indicate user-understandable node names and cell names which makes it so difficult for tracking down the connections within the graph.
Any suggestions ?