Closed HalflingWizard closed 2 years ago
an error occurs when I want to backpropagate.
Do you have detailed error logs?
Do you have detailed error logs?
Here are the errors:
But I guess they are not what you were asking for... I should try again using debug mode, right?
Is there any way to get the ipython stuff out of the way and just use the regular python prompt, and/or run it inside gdb, as in gdb --args python args (gdb) r ? I feel like that might be a confusing factor. Who knows what that is adding into the mix.
I used the regular python prompt as you suggested, and here is the stack trace for get_arc_post
:
[F] /home/runner/work/k2/k2/k2/python/csrc/torch/torch_util.h:124:k2::Array1<U> k2::FromTorch(at::Tensor) [with T = float] Check failed: tensor.strides()[0] == 1 (0 vs. 1) Expected stride: 1. Given: 0
[ Stack-Trace: ]
/usr/local/lib/python3.7/dist-packages/libk2_log.so(k2::internal::GetStackTrace()+0x4c) [0x7f7aeb1fe45c]
/usr/local/lib/python3.7/dist-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x28f8a) [0x7f7aef2b7f8a]
/usr/local/lib/python3.7/dist-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x3e3db) [0x7f7aef2cd3db]
/usr/local/lib/python3.7/dist-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x5a967) [0x7f7aef2e9967]
/usr/local/lib/python3.7/dist-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x206dc) [0x7f7aef2af6dc]
python3(_PyMethodDef_RawFastCallKeywords+0x264) [0x593784]
python3(_PyEval_EvalFrameDefault+0x3cf4) [0x515244]
python3(_PyFunction_FastCallDict+0x15a) [0x4bc98a]
python3(_PyEval_EvalFrameDefault+0x1f56) [0x5134a6]
python3(_PyEval_EvalCodeWithName+0x346) [0x549576]
python3(_PyFunction_FastCallDict+0x2e9) [0x4bcb19]
python3() [0x59c019]
python3(PyObject_Call+0x66) [0x595ef6]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so(torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)+0x183) [0x7f7b4778faa3]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so(+0x2bfed70) [0x7f7b39080d70]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so(torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&)+0x14e0) [0x7f7b3907ca60]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so(torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&)+0x4a0) [0x7f7b3907d670]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so(torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>)+0x490) [0x7f7b3907b310]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so(torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>)+0x3c) [0x7f7b4778723c]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so(torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&)+0xacd) [0x7f7b3907a36d]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so(torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&)+0x4e) [0x7f7b4778703e]
/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so(THPEngine_run_backward(THPEngine*, _object*, _object*)+0xe3f) [0x7f7b4778810f]
python3(_PyMethodDef_RawFastCallKeywords+0x315) [0x593835]
python3() [0x548c51]
python3(_PyEval_EvalFrameDefault+0x12a1) [0x5127f1]
python3(_PyEval_EvalCodeWithName+0x346) [0x549576]
python3(_PyFunction_FastCallKeywords+0x37e) [0x593fce]
python3() [0x548ae9]
python3(_PyEval_EvalFrameDefault+0x411f) [0x51566f]
python3(_PyEval_EvalCodeWithName+0x346) [0x549576]
python3(_PyFunction_FastCallKeywords+0x37e) [0x593fce]
python3(_PyEval_EvalFrameDefault+0x8dc) [0x511e2c]
python3(_PyEval_EvalCodeWithName+0x346) [0x549576]
python3(PyEval_EvalCode+0x23) [0x604173]
python3() [0x5f5506]
python3(PyRun_FileExFlags+0x9c) [0x5f8c6c]
python3(PyRun_SimpleFileExFlags+0x196) [0x5f9206]
python3() [0x64faf2]
python3(_Py_UnixMain+0x2e) [0x64fc4e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7b4bdfac87]
python3(_start+0x2a) [0x5b621a]
Traceback (most recent call last):
File "/content/lm_wfst/code.py", line 211, in <module>
(-score).backward()
File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/usr/local/lib/python3.7/dist-packages/k2/autograd.py", line 336, in backward
fsas.arcs, incoming_arcs, arc_post_grad)
RuntimeError:
Some bad things happened. Please read the above error messages and stack
trace. If you are using Python, the following command may be helpful:
gdb --args python /path/to/your/code.py
(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.).
If you are unable to fix it, please open an issue at:
https://github.com/k2-fsa/k2/issues/new
I hope it is as you wanted.
I see what the issue is.
RuntimeError Traceback (most recent call last)
[<ipython-input-40-78c710214bf8>](https://localhost:8080/#) in <module>()
1 score = ap.sum()
----> 2 (-score).backward()
[F] /home/runner/work/k2/k2/k2/python/csrc/torch/torch_util.h:124:k2::Array1<U> k2::FromTorch(at::Tensor) [with T = float] Check failed: tensor.strides()[0] == 1 (0 vs. 1) Expected stride: 1. Given: 0
PyTorch sets the stride of the gradient of ap
to 0 and it uses a scalar to represent a tensor, which cannot be handled by k2's 1-D array.
Will make a PR to fix it.
@HalflingWizard Please try #970.
You can replace /usr/local/lib/python3.7/dist-packages/k2/autograd.py
with the file from #970
You can replace
/usr/local/lib/python3.7/dist-packages/k2/autograd.py
with the file from #970
Thanks, it works now.
The FSA I'm working on, has these properties:
"Valid|Nonempty|TopSorted|TopSortedAndAcyclic|ArcSorted|ArcSortedAndDeterministic|MaybeAccessible|MaybeCoaccessible"
when I use
get_tot_scores
to get the scores of the model, I can callbackward
method with no problems. but when I callget_arc_post
,get_backward_scores
orget_forward_scores
, (to get all paths, not only the best one.) an error occurs when I want to backpropagate.for example, the following is the stack trace when I use
get_arc_post
:I would appreciate if anyone could suggest a solution to this problem.