Open gante opened 2 years ago
@gante do you require any help with this issue? Happy to contribute
Hi @anmolsjoshi 👋
If you are comfortable with debugging XLA, absolutely :) My recommendation would be to pick a model from "Models failing complex tests" (the others might require significant architecture changes), and to start debugging. The number 1 suspect is always the position embeddings, which may not be handling the case where past
is padded. Let me know if you are up to it, and which model would you like to take!
Hi @gante, I did have a bit of a poke around. I think the complex tests all fail for the same reason: those models have a setting max_position_embeddings
that is set to 20 by default during testing and which is too short for the “slow” tests. Here’s a simple fix for those: https://github.com/dsuess/transformers/commit/4a3e27164ae941fcd649b8565d7d92a4552d689f. I’ll give the other ones a shot now
Hello @gante, may I ask if there is anything that I can contribute?
Hi JuheonChu 👋 Actually yes! I have a few unchecked models at the top, but I wouldn't recommend spending time there unless you plan to use those architectures -- they are infrequently used.
However, two popular models are currently failing their XLA tests with beam search:
You can see the failing test if you install from main
(pip install --upgrade git+https://github.com/huggingface/transformers.git
) and run it e.g. for OPT NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 py.test -vv tests/models/opt/test_modeling_tf_opt.py::TFOPTModelTest::test_xla_generate_slow
I haven't dived in yet, so I don't know what's the cause for the failure. You'll have to hop into debug mode and see what is breaking :)
Can @katiele47 and I try working on them?
@JuheonChu of course!
@JuheonChu of course! @gante Are we figuring out the cause of the testing failures based on the clues as follows?
@JuheonChu yes. My suggestion would be to attempt to find where the numerical differences start from (between the XLA and the non-XLA version), using a debugger. Please note that you can't print variables with jit_compile=True
, so you should set it to False
. From there, the root cause is typically apparent.
Be warned, these sort of tasks sometimes are very time-consuming to complete :)
@JuheonChu yes. My suggestion would be to attempt to find where the numerical differences start from (between the XLA and the non-XLA version), using a debugger. Please note that you can't print variables with
jit_compile=True
, so you should set it toFalse
. From there, the root cause is typically apparent.Be warned, these sort of tasks sometimes are very time-consuming to complete :)
Thank you very much for your valuable guidance! We will try and keep you updated!
Hi @gante, I've attempted to reproduce the failed XLA test on the OPT model using your suggested commands. The cause of error I had was somehow different from @JuheonChu's. Would you be able to verify if the following is the expected failing test output? If not, I assume it could be due to my local repo. Thanks!
@gante working on XLNet
This issue is used to track TensorFlow XLA generation issues, arising from #17857. There are three categories of issues, sorted in descending order by severity:
Key model issues
These are heavily-used models, whose quality should be prioritized.
max_length
. See here.Models failing basic tests
These models are failing
test_xla_generate_fast
-- a short greedy generation.Models failing complex tests
These are models failing
test_xla_generate_slow
-- a long beam search generation.