Open avoraTT opened 4 weeks ago
@aliuTT @cglagovichTT
I'll take a look, can you paste the test cmd and commit/branch you're on?
ssh 10.230.36.208 (sjc-snva-t3002)
You can pull on main. The command is: pytest -svv models/experimental/llama2_70b/tests/test_llama_model_t3000.py
.
On these lines in models/experimental/llama2_70b/tests/test_llama_model_t3000.py
comment out ("llama2") and only use ("llama3") to speed things up.
updated screenshot:
Try this commit: e47709ed0
. I wasn't able to get segfaults on sjc-snva-t3002 locally. Also, I'm done with the machine.
Thanks! We will stress test this this locally as well
After cherry picking the commit e47709ed0
above, we see this segfault locally on sjc-snva-t3002
#0 0x0000000007b25a70 in ?? ()
#1 0x00007fff888244dc in tt::WorkExecutor::push_work(std::shared_ptr<std::function<void ()> >, bool) () from /home/avora/tt-metal/build/lib/libtt_metal.so
#2 0x00007fff8881af5c in tt::tt_metal::Device::push_work(std::shared_ptr<std::function<void ()> >, bool) () from /home/avora/tt-metal/build/lib/libtt_metal.so
#3 0x00007fff88e13fc9 in std::__detail::__variant::__gen_vtable_impl<true, std::__detail::__variant::_Multi_array<void (*)(tt::tt_metal::Tensor::deallocate(bool)::$_0&&, std::variant<tt::tt_metal::OwnedStorage, tt::tt_metal::DeviceStorage, tt::tt_metal::BorrowedStorage, tt::tt_metal::MultiDeviceHostStorage, tt::tt_metal::MultiDeviceStorage>&)>, std::tuple<std::variant<tt::tt_metal::OwnedStorage, tt::tt_metal::DeviceStorage, tt::tt_metal::BorrowedStorage, tt::tt_metal::MultiDeviceHostStorage, tt::tt_metal::MultiDeviceStorage>&>, std::integer_sequence<unsigned long, 4ul> >::__visit_invoke(tt::tt_metal::Tensor::deallocate(bool)::$_0&&, std::variant<tt::tt_metal::OwnedStorage, tt::tt_metal::DeviceStorage, tt::tt_metal::BorrowedStorage, tt::tt_metal::MultiDeviceHostStorage, tt::tt_metal::MultiDeviceStorage>&) () from /home/avora/tt-metal/build/lib/libtt_eager.so
#4 0x00007fff88e0c022 in tt::tt_metal::Tensor::~Tensor() () from /home/avora/tt-metal/build/lib/libtt_eager.so
#5 0x00007fff89229212 in pybind11::class_<tt::tt_metal::Tensor>::dealloc(pybind11::detail::value_and_holder&) () from /home/avora/tt-metal/tt_eager/tt_lib/_C.so
#6 0x00007fff8910d59b in pybind11::detail::clear_instance(_object*) () from /home/avora/tt-metal/tt_eager/tt_lib/_C.so
#7 0x00007fff8910d154 in pybind11_object_dealloc () from /home/avora/tt-metal/tt_eager/tt_lib/_C.so
#8 0x00000000005b030c in ?? ()
#9 0x000000000058738d in ?? ()
#10 0x00000000005b030c in ?? ()
#11 0x000000000058738d in ?? ()
#12 0x00000000005cc0cb in ?? ()
#13 0x00000000005b030c in ?? ()
#14 0x00000000005835c2 in ?? ()
#15 0x00000000004c518f in ?? ()
#16 0x00000000005dca27 in ?? ()
#17 0x0000000000515e6a in _PyObject_GC_New ()
#18 0x00000000006b0403 in ?? ()
#19 0x00000000004e9618 in PyObject_GetIter ()
#20 0x00007fff30e65217 in pyo3::types::iterator::PyIterator::from_object () from /home/avora/tt-metal/python_env/lib/python3.8/site-packages/tiktoken/_tiktoken.cpython-38-x86_64-linux-gnu.so
#21 0x00007fff30e6c1aa in pyo3::types::any::PyAny::iter () from /home/avora/tt-metal/python_env/lib/python3.8/site-packages/tiktoken/_tiktoken.cpython-38-x86_64-linux-gnu.so
#22 0x00007fff30e6060f in pyo3::types::sequence::extract_sequence () from /home/avora/tt-metal/python_env/lib/python3.8/site-packages/tiktoken/_tiktoken.cpython-38-x86_64-linux-gnu.so
#23 0x00007fff30e5eff3 in pyo3::conversions::std::map::<impl pyo3::conversion::FromPyObject for std::collections::hash::map::HashMap<K,V,S>>::extract () from /home/avora/tt-metal/python_env/lib/python3.8/site-packages/tiktoken/_tiktoken.cpython-38-x86_64-linux-gnu.so
#24 0x00007fff30e5139b in _tiktoken::_::<impl pyo3::impl_::pyclass::PyMethods<_tiktoken::CoreBPE> for pyo3::impl_::pyclass::PyClassImplCollector<_tiktoken::CoreBPE>>::py_methods::ITEMS::trampoline ()
from /home/avora/tt-metal/python_env/lib/python3.8/site-packages/tiktoken/_tiktoken.cpython-38-x86_64-linux-gnu.so
@mikevin920 @avoraTT are you still seeing this segfault locally?
When running the
test_llama_model_t3000.py
with thellama3
pytest parameter and the following order:"prefill_128", "decode", "prefill_2k"
, thedecode
tests results in a segfault when tiktoken modules calls_PyObject_GC_New ()
and led to a Tensor::deallocate call on device, which then shows with the following message (machine: sjc-nva-t3002):Although the segfault takes place in the tokenizer, the gdb trace (gdb_trace_llama3_segfault_decode.txt) shows that there is a deallocation issue somewhere. We thought that this was a similar issue to #8965 , however, when applying the tensor deallocation fix by @aliuTT , our issue still persists.
Note: if we switch order of the tests from
"prefill_128", "decode", "prefill_2k"
to"decode", "prefill_128", "prefill_2k"
, the segfault goes away. This maybe points to an issue in maintaining states between different pytests?