Closed sshleifer closed 4 years ago
What are you trying to view?
The first 10 values in the layer tensor.
Inside the loop, you can add
debug(layer, prefix_ + "_l" + std::to_string(i) + "_self");
When you then run the code, you will see a numpy-like view of the tensor.
Ok, that worked, thank you!
For the other noobs, you need to run make -j4
after adding debug statements.
Yes, you need to rebuild.
Is there a simple way to make rebuild faster by leveraging the fact that I'm only changing transformer.h?
Add -DUSE_CCACHE=on
. You need to have ccache installed for that. That's a tool that caches objects that do not need to be changed. Should speed up building after the project has been built once.
cmake .. -DUSE_CCACHE=on
I've isolated at least one issue, positional embeddings are different.
Is there a way to see the signal
variable in https://github.com/marian-nmt/marian/blob/master/src/models/transformer.h#L87 ?
The same way, debug(signal, "PosEmbs");
in that function.
I think I remember there is a off-by-one difference for position embeddings between Marian and Fairseq, maybe it's the same.
I got to the bottom of the positional embeddings differences (it's off by 256 :)), and now the encoders return the same thing. For the decoder, I am trying to understand what the first tensor being sent to the decoder is, but the following debugging statement fails:
debug(startStates, "startStates") // L648, reference to non-static member function must be called
Another command that fails that I'll probably need is
debug(logits, "logits"); // ~line 820 of transformer.h
Thanks again for your help!
Neither of them are tensors, these are classes on which you cannot call debug.
Sometimes my debug statements don't show up in places I would expect them to, for example, in the LayerAttention
I added two debugging statements:
debug(output, "output before postprocess");
auto opsPost = opt<std::string>("transformer-postprocess");
output = postProcess(prefix + "_Wo", opsPost, output, input, dropProb);
debug(output, "output after postprocess");
The first message is always written to standard error, but the second only writes to stderr when it is called by DecoderLayerSelfAttention
or by the encoder. Not for cross-attention. The program just keeps going and terminates successfully, there is no error.
Have you seen that sort of behavior?
The debug statements mark nodes to output their values when they are passed through. If the node is passed through only once you will see only one message, regardless how many debug statements you insert for that node.
Can I make a copy of it to see the new value?
No, it also means there is no new value, otherwise there would be a new node.
Seems to me the call of postProcess
is not really doing anything in your config, just returning the same identical node?
To summarize. Computation in Marian is never in-place. If you expect a value to change, then there has to be a new node. If there are two distinct nodes, the debug statements will appear twice. If you call the debug statement multiple times on the same node, it will only appear once.
Sometimes nodes are being collapsed already during construction. Say for given a
and b
you do
Expr x = a + b;
Expr y = a + b;
debug(x, "Message1");
debug(y, "Message2");
You will only see one message, as the code knew that x
is the same as y
and y
was never created, just replaced by x
already during creation. But the message gets overwritten, so we will only see Message2
.
my config has "transformer-postprocess": "dan"
, and get debug statements from the encoder's calls. Is postprocessing somehow disabled during the decoder?
Full Config Here
No. That should do something. Check if pointer values of output before/after are different. Something like
std::cerr << (size_t)output << std::endl;
should work.
doesn't compile:
transformer.h:335:18: error: cannot convert
'IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >' to
'size_t' (aka 'unsigned long') without a conversion operator
Sorry, try this
std::cerr << (size_t)output.get() << std::endl;
postProcess
gets called and the pointer changes, but debug
still does not output the tensor.
Logging statements (those with timestamp generated inside postProcess
, without timestamps are from std::cerr << (size_t)output.get() << std::endl;
)
[2020-04-22 10:54:48] decoder_l1_selfabout to call postprocess
4417515776
[2020-04-22 10:54:48] decoder_l1_self_Wo: postProcess ops: dan
4417516288
[2020-04-22 10:54:48] decoder_l1_selfdone postprocess
[2020-04-22 10:54:48] decoder_l1_contextabout to call postprocess
4417519360
[2020-04-22 10:54:48] decoder_l1_context_Wo: postProcess ops: dan
4417519872
However, even though the pointer changes, the debugging statement associated with the final value does not display:
See the following output from debug
statements:
[2020-04-22 10:54:48] Debug: decoder_l1_self output before postprocess op=affine
[2020-04-22 10:54:48] shape=1x1x1x512 size=512 type=float32 device=cpu0 ptr=4526488576 bytes=2048
min: -2.17710781 max: 6.77481270 l2-norm: 13.08215678
[[[[ -0.23997170 0.15916049 0.06509697 -0.12403495 -0.60352713 ... -0.07692891 -0.16747165 -0.24933907 -0.19263674 0.01830016 ]]]]
[2020-04-22 10:54:48] Debug: decoder_l1_selfoutput after postprocess op=layer_normalization
[2020-04-22 10:54:48] shape=1x1x1x512 size=512 type=float32 device=cpu0 ptr=4526488576 bytes=2048
min: -2.98920894 max: 13.56639862 l2-norm: 25.95007426
[[[[ -0.71452057 -0.08474327 -0.73883373 -0.75963336 -1.36101186 ... 0.39763063 0.20678757 0.34616321 0.83726954 0.84396362 ]]]]
[2020-04-22 10:54:48] Debug: decoder_l1_context output before postprocess op=affine [2020-04-22 10:54:48] shape=1x1x1x512 size=512 type=float32 device=cpu0 ptr=4526490624 bytes=2048
min: -1.78802609 max: 1.57692146 l2-norm: 6.58460730
[[[[ -0.44128269 -0.01378652 0.29056573 -0.67267472 0.37242836 ... 0.51235217 -0.13973767 0.17449422 -0.19592692 -0.11995535 ]]]]
// ALL GOOD UNTIL HERE, then missing "after preprocess statement"
[2020-04-22 10:54:48] Debug: OUTPUT:decoder_l op=layer_normalization
[2020-04-22 10:54:48] shape=1x1x1x512 size=512 type=float32 device=cpu0 ptr=4526488576 bytes=2048
min: -2.09566784 max: 7.83572817 l2-norm: 23.84594046
[[[[ -0.66445428 0.34865451 -0.46758631 -1.03975117 -0.78854591 ... 0.80517358 -0.05014769 0.57723463 1.16044199 0.96584964 ]]]]
Nope. All as expected. Your very last debug is called on the same node as your last output and you overwrote the debug message. As I said, you will only see one debug message per node. I could create a list of debug messages in the future, so these do not get overwritten, but accumulated.
Updated the comment above with x = a + b
with the information that messages get overwritten with multiple debug calls.
The list of debug messages would be wonderful!
Fixed the bug. had context_Wo_ln_scale and context_Wo_ln_bias flipped. Thank you so much for helping so much!
Closing then. Feel free to reopen if there are more issues.
I am a C++ noob and trying to diagnose/debug why my python-port of a transformer produces different results than marian-decoder on the same tokens. (the problem is that my python model generates the same token over and over again)
I've been trying to set breakpoints in vscode at https://github.com/marian-nmt/marian/blob/master/src/models/transformer.h#L530, but when I hit them and run
-exec fr variable layer
(the intermediate outputs), I getIs there a better/suggested way to print out intermediate outputs for debugging purposes?
Thanks in advance and sorry if the question is dumb.