wez / govee2mqtt

Govee2MQTT: Connect Govee lights and devices to Home Assistant
MIT License
454 stars 29 forks source link

addon cannot start when there is a network outage if configured to use Platform and/or IoT APIs #76

Open wez opened 9 months ago

wez commented 9 months ago

here's the log from trying to start with no internet connectivity.

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
GOVEE_API_KEY=REDACTED
GOVEE_MQTT_HOST=core-mosquitto
GOVEE_EMAIL=REDACTED
GOVEE_PASSWORD=REDACTED
GOVEE_MQTT_PASSWORD=REDACTED
GOVEE_MQTT_USER=addons
GOVEE_MQTT_PORT=1883
++ cd /app
++ exec /app/govee serve
[2024-01-15T17:25:05 INFO  govee::commands::serve] Starting service. version 2024.01.13-b7277f05
[2024-01-15T17:25:05 INFO  govee::commands::serve] Querying platform API for device list
[2024-01-15T17:25:10 WARN  govee::cache] error sending request for url (https://openapi.api.govee.com/router/api/v1/user/devices): error trying to connect: dns error: failed to lookup address information: Try again: error trying to connect: dns error: failed to lookup address information: Try again: dns error: failed to lookup address information: Try again: failed to lookup address information: Try again, will use prior results
Error: error sending request for url (https://openapi.api.govee.com/router/api/v1/user/devices): error trying to connect: dns error: failed to lookup address information: Try again: error trying to connect: dns error: failed to lookup address information: Try again: dns error: failed to lookup address information: Try again: failed to lookup address information: Try again

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: anyhow::__private::format_err
   2: govee::cache::CacheResult<T>::into_result
   3: govee::platform_api::GoveeApiClient::get_devices::{{closure}}
   4: govee::commands::serve::ServeCommand::run::{{closure}}
   5: govee::Args::run::{{closure}}
   6: tokio::runtime::park::CachedParkThread::block_on
   7: tokio::runtime::runtime::Runtime::block_on
   8: govee::main
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: std::rt::lang_start::{{closure}}
  11: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
  12: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  15: std::rt::lang_start_internal::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
  16: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  17: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  18: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  19: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  20: std::rt::lang_start
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped

Originally posted by @tankdeer in https://github.com/wez/govee2mqtt/issues/30#issuecomment-1892941079

wez commented 9 months ago

The workaround is to remove the API Key and govee email/password, but it's not a great workaround.

tankdeer commented 9 months ago

Not sure how plausible it is, but I think an ideal scenario would be for the add-on to start in a sort of offline mode if there's an outage, rather than throwing an exception. That would still allow LAN control devices to function. And perhaps at some interval (every 5 minutes? 10 minutes?), it would check that connectivity has been restore and if so, restore the rest of the functionality.

I am sure this is no small task, just thinking best case scenario.

wez commented 9 months ago

The challenge is this: when first starting up, the names and rooms are retrieved from the APIs and those affect how the entities get created in home assistant. So we're deliberately gating startup on successfully reaching govee, because the first-run experience is degraded and higher effort if there is some kind of connectivity problem when users are setting things up. There are a decent number of users that are doing things with their network access in the name of security that will be impacted by this.

We could make changes to do this gating only the first start, but we'd then need to have some kind of persistent memory, and right now we are, except for cache data which can be discarded, stateless.

So there's some more general work to be done here to manage persistent state (eg: there'd need to be ways to see/edit/delete it) before we can make this particular scenario nicer.

tankdeer commented 9 months ago

Does that only affect discovery of new devices, or is that necessary for existing devices as well? It makes sense to me that attempting to configure new devices without cloud connectivity would be problematic

wez commented 9 months ago

if there is no internet connection, then you won't be able to use either the iot or platform APIs, so you won't be able to discover them or talk to them. For the LAN api, we don't need internet access to discover or control them, but the LAN API has no naming metadata so if we discover a device on the LAN and don't have internet access it will end up with a name like SKU_SHORTID instead of something more meaningful.

tankdeer commented 9 months ago

Of course I can't speak for everybody, but personally I have no problem with saying, "internet access required for initial setup of ALL devices, even LAN-controlled ones (If you want meaningful names)".

But as long as the addon is still able to start & run in those scenarios, then we can still control LAN devices while offline, and resume full control once the outage is resolved.

I am thinking temporary outages, not fully isolated networks, as I don't see the point of that here, as so much functionality relies on the various cloud based APIs

wez commented 9 months ago

I'm in agreement, however, it doesn't change the amount of work required for this to be realized.

tankdeer commented 9 months ago

it doesn't change the amount of work required for this to be realized.

Absolutely, I understand it's no small task. Thanks again