mihai-dinculescu / tapo

Unofficial Tapo API Client. Works with TP-Link Tapo smart devices. Tested with light bulbs (L510, L520, L530, L610, L630), light strips (L900, L920, L930), plugs (P100, P105, P110, P115, P300), hubs (H100), switches (S200B) and sensors (KE100, T100, T110, T300, T310, T315).
MIT License
372 stars 37 forks source link

Sporadic segfaults in a python module when using ApiClient to query a Tapo P110 #228

Closed gsaviane closed 1 month ago

gsaviane commented 3 months ago

I get random segfaults executing a python module that uses ApiClient to query a P110 device. Unfortunately the Python stack frame is not available when the process receives the signal, but I could catch a core dump that I analyzed with gdb. This is what it says

0 pthread_kill_implementation (threadid=548354912640, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 1 0x0000007faeab0a64 in __pthread_kill_internal (signo=11, threadid=) at ./nptl/pthread_kill.c:78 2 0x0000007faea6a76c in GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 3 4 0x0000000000490d30 in ?? () 5 0x0000007fad41f054 in pyo3::types::any::PyAny::call_method::hacc5388e8a698dd3 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 6 0x0000007fad41cd00 in pyo3_asyncio::call_soon_threadsafe::h189597b182986bc9 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 7 0x0000007fad41b164 in pyo3_asyncio::generic::set_result::h943446c3e13924f1 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 8 0x0000007fad2d5e08 in ::spawn::{{closure}}::h4499f884fd2416d9 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 9 0x0000007fad2c7184 in tokio::runtime::task::core::Core<T,S>::poll::h6f19112f45830e8f () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 10 0x0000007fad2827f8 in tokio::runtime::task::harness::Harness<T,S>::poll::h75be9602525f0e15 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 11 0x0000007fad4314a8 in tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h75332304a442adb5 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 12 0x0000007fad4306f0 in tokio::runtime::scheduler::multi_thread::worker::Context::run::ha0d088c158a7571f () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 13 0x0000007fad42e714 in tokio::runtime::context::set_scheduler::hb924c7a4ab654997 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 14 0x0000007fad429174 in tokio::runtime::context::runtime::enter_runtime::h2dc922c95f430ff4 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 15 0x0000007fad430558 in tokio::runtime::scheduler::multi_thread::worker::run::hcd92fda4015a4913 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 16 0x0000007fad424638 in <tokio::runtime::blocking::task::BlockingTask as core::future::future::Future>::poll::hf97cccf4b76a37d5 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 17 0x0000007fad42b478 in tokio::runtime::task::core::Core<T,S>::poll::h66ccbdb9dab0dc88 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 18 0x0000007fad42b9b8 in tokio::runtime::task::harness::Harness<T,S>::poll::hc66c5f1784947dac () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 19 0x0000007fad424884 in std::sys_common::backtrace::__rust_begin_short_backtrace::h51368bf6bad8d526 () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 20 0x0000007fad436388 in core::ops::function::FnOnce::call_once{{vtable.shim}}::h5d7b156d45e1195b () from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so 21 0x0000007fad7b6d30 in alloc::boxed::{impl#47}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2020 22 alloc::boxed::{impl#47}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2020 23 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:108 24 0x0000007faeaaee58 in start_thread (arg=0x7fe90c4f57) at ./nptl/pthread_create.c:442 25 0x0000007faeb17f9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

I can provide the dumped file if needed. Seen on 0.2.1 and 0.3.0 versions

mihai-dinculescu commented 3 months ago

Thank you for raising the issue. Debugging this is going to be fun :) Are you able to isolate the Python code that's causing the issue and share it?

gsaviane commented 3 months ago

Thank you for raising the issue. Debugging this is going to be fun :) Are you able to isolate the Python code that's causing the issue and share it?

Unfortunately stderr is only receiving this from the dying process

Thread 0x0000007fa31164c0 (most recent call first): (no Python frame) Fatal Python error: Segmentation fault

The code causing the segfault is attached here [removed]

gsaviane commented 3 months ago

OK, I might have found the cause. This little piece of python code is executed as a telegraf input plugin to collect data from the Tapos and forward it to an MQTT topic. The plugin was set to run every 5 secs with a 4 sec timeout if it does not complete in that time window. Normally the plugin takes less than 1 sec to complete, but in some occasions (network lags, device not ready) it goes past the timeout, and Telegraf preempts it with a SIGTERM. Just by increasing the timeout it's not happening again, and the problem is reproducible. If you have a Tapo P1XX device, just execute it as a normal python program and send it a SIGTERM before it exits (probably you would need a sleep()). It may lack some graceful disposal of the threads created in by tokio upon a SIGTERM

mihai-dinculescu commented 3 months ago

Thank you for the update.

I was not able to replicate the issue without TaskGroups. Did you? Have you tried handling the SIGTERM on the Python side and cancelling the TaskGroups?

PS: I removed the zip file you've attached because it looked like it contained your Tapo password. You might want to change it.

gsaviane commented 3 months ago

Good catch! What an airhead I am, I changed it right away. Tomorrow I will try what you suggested

Thanks!

Il sab 15 giu 2024, 21:43 Mihai Dinculescu @.***> ha scritto:

Thank you for the update.

I was not able to replicate the issue without TaskGroups. Did you? Have you tried handling the SIGTERM on the Python side and cancelling the TaskGroups?

PS: I removed the zip file you've attached because it looked like it contained your Tapo password. You might want to change it.

— Reply to this email directly, view it on GitHub https://github.com/mihai-dinculescu/tapo/issues/228#issuecomment-2170595255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5KCCWC6PRMGXVXGCKCZ63ZHSKMLAVCNFSM6AAAAABJHGMLTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZQGU4TKMRVGU . You are receiving this because you authored the thread.Message ID: @.***>

gsaviane commented 3 months ago

Hi, I tried with a SIGTERM handler to cancel the tasks but it fails in the same way. However, now I can get logged more details about the error

Thread 0x0000007f9df514c0 (most recent call first):

Signal 15 received. Cancelling tasks ... Error unhandled errors in a TaskGroup (1 sub-exception) Traceback (most recent call last): File "/usr/local/bin/tapo-telegraf-multi-async.py", line 149, in asyncio.run(main()) File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/usr/local/bin/tapo-telegraf-multi-async.py", line 143, in main if task.result() == "": ^^^^^^^^^^^^^ File "/usr/local/bin/tapo-telegraf-multi-async.py", line 109, in gen_influx_str_from_ip p110 = await init_p110(ip) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/bin/tapo-telegraf-multi-async.py", line 73, in init_p110 p110 = await client.p110(ip) ^^^^^^^^^^^^^^^^^^^^^ Exception: Tapo(Unknown(1003)) Signal 15 received. Cancelling tasks ... thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.3/src/gil.rs:199:21: assertion `left != right` failed: The Python interpreter is not initialized and the `auto-initialize` feature is not enabled. Consider calling `pyo3::prepare_freethreaded_python()` before attempting to use Python APIs. left: 0 right: 0 stack backtrace: 0: rust_begin_unwind at ./rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5 1: core::panicking::panic_fmt at ./rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed 4: parking_lot::once::Once::call_once_force::{{closure}} 5: parking_lot::once::Once::call_once_slow 6: pyo3::gil::GILGuard::acquire 7: ::spawn::{{closure}} 8: tokio::runtime::task::core::Core::poll 9: tokio::runtime::task::harness::Harness::poll 10: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 11: tokio::runtime::scheduler::multi_thread::worker::Context::run 12: tokio::runtime::context::set_scheduler 13: tokio::runtime::context::runtime::enter_runtime 14: tokio::runtime::scheduler::multi_thread::worker::run 15: as core::future::future::Future>::poll 16: tokio::runtime::task::core::Core::poll Signal 15 received. Cancelling tasks ... Signal 15 received. Cancelling tasks ... Fatal Python error: Segmentation fault
mihai-dinculescu commented 3 months ago

Are you able to try out the suggested solution to use the pyo3 auto-initiatize feature?

gsaviane commented 3 months ago

What you suggest is to rebuild the python package with that PyO3 feature enabled? If so, I need some guidance. By the way, I upgraded the tapo Python package to 0.3.1, and now my script hangs occasionally requiring a sigterm to exit. I needed to revert to 0.3.0

mihai-dinculescu commented 1 month ago

Closing this as a duplicate of https://github.com/mihai-dinculescu/tapo/issues/245.