nerves-hub / nerves_hub_link

Connect devices to NervesHub via a Phoenix channel
https://hex.pm/packages/nerves_hub_link
Apache License 2.0
36 stars 18 forks source link

NervesHubLink Fails to Start #52

Closed psteininger closed 3 years ago

psteininger commented 3 years ago

I was trying to get a certificate set up on the ATECC608A today, and I ran into the following error:

15:33:26.670 [info]  Application nerves_hub_link exited: exited in: NervesHubLink.Application.start(:normal, [])
    ** (EXIT) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
        ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

I also get this when I run this from ssh console on the device:

iex(2)> Application.ensure_all_started(:thirsty)
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

Trying to establish connection to ATEC608A results in similar error:

iex(3)> {:ok, i2c} = ATECC508A.Transport.I2C.init(bus_name: "i2c-1")
** (exit) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.11.2) lib/gen_server.ex:1017: GenServer.call/3
    lib/atecc508a/transport/i2c.ex:43: ATECC508A.Transport.I2C.init/1

I am pretty stumped as to what is wrong. This runs on a Raspberry Pi Zero W rev 1.2, I really appreciate some guidance on this. These are my current dependencies:

defp deps do
    [
      # Dependencies for all targets
      {:nerves, "~> 1.7.0", runtime: false},
      {:nerves_hub_cli, "~> 0.10.4", runtime: false},
      {:nerves_hub_link, "~> 0.9.2"},
      {:nerves_time, "~> 0.2"},
      {:nerves_key, "~> 0.5"},

      {:toolshed, "~> 0.2.13"},
      {:shoehorn, "~> 0.7"},
      {:ring_logger, "~> 0.8.1"},
      {:circuits_i2c, "~> 0.3.7"},
      {:circuits_gpio, "~> 0.4.6"},
      {:ads1115, "~> 0.1"},
      {:gen_rmq, "~> 3.0"},
      {:jason, "~> 1.2"},
      {:wafer, git: "https://gitlab.com/jimsy/wafer"},

      # Dependencies for all targets except :host
      {:nerves_runtime, "~> 0.11.3"},
      {:nerves_pack, "~> 0.4.0", targets: @all_targets},
      {:busybox, "~> 0.1", targets: @all_targets},

      # Dependencies for specific targets
      {:nerves_system_rpi, "~> 1.13", runtime: false, targets: :rpi},
      {:nerves_system_rpi0, "~> 1.13", runtime: false, targets: :rpi0},
      {:nerves_system_rpi2, "~> 1.13", runtime: false, targets: :rpi2},
      {:nerves_system_rpi3, "~> 1.13", runtime: false, targets: :rpi3},
      {:nerves_system_rpi3a, "~> 1.13", runtime: false, targets: :rpi3a},
      {:nerves_system_rpi4, "~> 1.13", runtime: false, targets: :rpi4},
      {:nerves_system_bbb, "~> 2.6", runtime: false, targets: :bbb},
      {:nerves_system_x86_64, "~> 1.13", runtime: false, targets: :x86_64},
    ]
  end
fhunleth commented 3 years ago

What device are you trying this on? I'm inferring that it's not a Raspberry Pi Zero based on your comment.

fhunleth commented 3 years ago

Oh, I think that I misread your comment. You're saying that it is running on a Raspberry Pi Zero.

fhunleth commented 3 years ago

Could you run Application.ensure_all_started(:nerves_key) and then run Application.ensure_all_started(:thirsty)?

nerves_hub_link has an optional dependency on :nerves_key. Optional dependencies aren't started by Application.ensure_all_started/1.

psteininger commented 3 years ago

Application.ensure_all_started(:nerves_key)

{:error,
{:atecc508a,
{{:shutdown,
{:failed_to_start_child, ATECC508A.Transport.I2CSupervisor,
{:already_started, #PID<0.1836.0>}}},
{ATECC508A.Application, :start, [:normal, []]}}}}
psteininger commented 3 years ago

@fhunleth thanks for replying so quickly :). I want to add one more point. this happens before wlan0 is enabled (it becomes active about 5 seconds later).

I was able to start the supervisor manually, that's why the above error is confusing.

Here's what I got after reboot:

iex(1)> Application.ensure_all_started(:nerves_key)
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}
psteininger commented 3 years ago

Hmmm... this is a bit confusing. Now I am able to:

iex(3)> {:ok, i2c} = ATECC508A.Transport.I2C.init([])
{:ok, {ATECC508A.Transport.I2C, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"}}
psteininger commented 3 years ago

I do however, still see this:

iex(6)> Application.ensure_all_started(:nerves_hub_link)       
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {{:badmatch, {:error, :execution_error}},
      [
        {NervesKey, :device_cert, 2, [file: 'lib/nerves_key.ex', line: 97]},
        {NervesHubLink.Configurator.NervesKey, :"-maybe_add_cert/3-fun-0-", 2,
         [file: 'lib/nerves_hub_link/configurators/nerves_key.ex', line: 28]},
        {Keyword, :put_new_lazy, 3, [file: 'lib/keyword.ex', line: 629]},
        {NervesHubLink.Configurator.NervesKey, :build, 1,
         [file: 'lib/nerves_hub_link/configurators/nerves_key.ex', line: 16]},
        {NervesHubLink.Configurator, :build, 0,
         [file: 'lib/nerves_hub_link/configurator.ex', line: 21]},
        {NervesHubLink.Application, :start, 2,
         [file: 'lib/nerves_hub_link/application.ex', line: 14]},
        {:application_master, :start_it_old, 4,
         [file: 'application_master.erl', line: 277]}
      ]}}}}}}

and I only have this in config.exs for nerves_hub_link:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]
jjcarstens commented 3 years ago

This last error is because NervesHubLink tries to use the NervesKey automatically if the dependency is detected. My guess is that it fails here because the ATECC508A is not yet provisioned, which causes the failure trying to read the necessary certs from it. Are you able to manually get the device and signer cert from your NervesKey?

psteininger commented 3 years ago

@jjcarstens I haven't provisioned the device yet. I was stuck unable to connect to it because ATECC508A.Transport.I2C.init(bus_name: "i2c-1") was erroring out. It seems to be working now. I will try to get the certs loaded up tomorrow (it's 5:30 pm here and thus dinner time ;) ). I will update here tomorrow.

Thank you both for responding so quickly :). It feels great to receive that kind of support.

psteininger commented 3 years ago

I tried to recreate my original condition and commenting out these 2 lines in config.exs did the trick:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]

however, after putting them back in, rebuilding firmware and uploading I still get the original error:

iex(1)> {:ok, i2c} = ATECC508A.Transport.I2C.init(bus_name: "i2c-1")
** (exit) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.11.2) lib/gen_server.ex:1017: GenServer.call/3
    lib/atecc508a/transport/i2c.ex:43: ATECC508A.Transport.I2C.init/1

This is what I get for an error:

iex(1)> Application.ensure_all_started(:nerves_hub_link)            
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

I will keep digging in more and hopefully I will figure this out

psteininger commented 3 years ago

Hmm... so if I run this:

iex(2)> Application.ensure_all_started(:nerves_key)                 
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}

This then works:

{:ok, i2c} = ATECC508A.Transport.I2C.init([])
{:ok, {ATECC508A.Transport.I2C, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"}}

so how do I make sure this happens at startup?

psteininger commented 3 years ago

I was able to provision the chip with a signer cert. Side note here - I would suggest updating docs to specify --years-valid by default or set the default years to 10 or longer. Writing permanently a soon-to-expire cert into a chip is a bit of a trap.

So after updating config.exs to include:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]

config :nerves_hub_link, :nerves_key,
       certificate_pair: :primary,
       i2c_bus: 1

I still experience the same issue upon reboot (with updated firmware)

iex(1)> Application.ensure_all_started(:nerves_hub_link)                                     
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

But... if I do the following:

iex(2)> Application.ensure_all_started(:nerves_key)     
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}

Then this starts up ok:

iex(3)> Application.ensure_all_started(:nerves_hub_link)
{:ok,
 [:extty, :fwup, :unicode_util_compat, :idna, :mimerl, :certifi, :syntax_tools,
  :parse_trans, :ssl_verify_fun, :metrics, :hackney, :jason, :inets,
  :websocket_client, :phoenix_client, :nerves_hub_link]}

So, the question is... how to permanently fix it - is this something I need to address in my application.ex somehow or it's something that nerves_hub_link should deal with. It just seems that the :nerves_key application is not starting at all, or at least not before :nerves_hub_link.

psteininger commented 3 years ago

One last bit of data: Just running this:

Application.ensure_all_started(:atecc508a) 

is enough to allow ATECC508A.Transport.I2C.init([]) to work correctly. So, for some reason, the ATECC508A.Application and the ATECC508A.Transport.I2CSupervisor don't get started

fhunleth commented 3 years ago

@jjcarstens What do you think about putting Application.ensure_all_started(:nerves_key) in :nerves_hub_link to force it to start first. I don't know any other way around the optional dependency issue and I haven't seen much progress on accepting Jose's OTP PR in a while.

psteininger commented 3 years ago

@fhunleth where is the :nerves_key application defined? I can't see NervesKey.Application defined anywhere. How does that work?

fhunleth commented 3 years ago

It's not since :nerves_key is a pure library. However, if that were to ever change, then I didn't want to have to remember to update :nerves_hub_link.

I think that :nerves_hub_link only has an optional dependency on :nerves_key, so it seems wrong for it to call ensure_all_started on anything but its direct dependencies. Given that there's a call to ATECC508A.Transport.I2C.init/1, there's an argument that it should directly depend on :atecc508a too.

psteininger commented 3 years ago

Given that there's a call to ATECC508A.Transport.I2C.init/1, there's an argument that it should directly depend on :atecc508a too.

@fhunleth I think this makes sense.

In the meantime, is there any reliable way for me to make sure that the ATECC508A.Transport.I2CSupervisor starts up when my app starts up?

fhunleth commented 3 years ago

:shoehorn loads the :init list in order, so if you add :atecc508a to the front of it in your config, that should work.

psteininger commented 3 years ago

I got this in target.exs:

config :shoehorn,
  init: [:nerves_runtime, :nerves_pack, :atecc508a],
  app: Mix.Project.config()[:app]

but I still see these errors:

22:18:19.561 [info]  Application nerves_hub_link exited: exited in: NervesHubLink.Application.start(:normal, [])
    ** (EXIT) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
        ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

22:18:19.564 [info]  Application x509 exited: :stopped

22:18:19.578 [info]  Application phoenix_client exited: :stopped

22:18:19.579 [info]  Application websocket_client exited: :stopped

22:18:19.593 [info]  Application inets exited: :stopped

22:18:19.594 [info]  Application jason exited: :stopped

22:18:19.606 [info]  Application hackney exited: :stopped

22:18:19.608 [info]  Application metrics exited: :stopped

22:18:19.609 [info]  Application ssl_verify_fun exited: :stopped

22:18:19.610 [info]  Application parse_trans exited: :stopped

22:18:19.611 [info]  Application syntax_tools exited: :stopped

22:18:19.613 [info]  Application certifi exited: :stopped

22:18:19.614 [info]  Application mimerl exited: :stopped

22:18:19.615 [info]  Application idna exited: :stopped

22:18:19.617 [info]  Application unicode_util_compat exited: :stopped

22:18:19.618 [info]  Application fwup exited: :stopped

22:18:19.619 [info]  Application extty exited: :stopped
psteininger commented 3 years ago

@fhunleth I think I know what's going on. It looks like the order of the dependencies matters. Here's what was causing problems:

iex(34)> cat "thirsty.app"
{application,thirsty,
             [{applications,[kernel,stdlib,elixir,logger,runtime_tools,
                             nerves_hub_link,nerves_time,nerves_key,toolshed,
                             shoehorn,ring_logger,circuits_i2c,circuits_gpio,
                             ads1115,gen_rmq,jason,wafer,nerves_runtime,
                             nerves_pack,busybox]},
              {description,"thirsty"},
             ...
              {registered,[]},
              {vsn,"0.1.0"},
              {mod,{'Elixir.Thirsty.Application',[]}}]}.

I have changed the dependency definitions to start like this:

 {:nerves, "~> 1.7.0", runtime: false},
      {:shoehorn, "~> 0.7"},
      {:circuits_i2c, "~> 0.3.7"},
      {:circuits_gpio, "~> 0.4.6"},
      {:nerves_key, "~> 0.5", optional: false},
      {:nerves_hub_cli, "~> 0.10.4", runtime: false},
      {:nerves_hub_link, "~> 0.9.2"},
      {:nerves_time, "~> 0.2"},
      {:toolshed, "~> 0.2.13"},

and for good measure added nerves_key to :extra_applications. It's hackey, but it did the trick.

psteininger commented 3 years ago

I think I just realized what was going on. It's pretty embarrassing, actually. It turns out that I was building the prod version of the app, but uploading the dev version. I believe the dev build was misconfigured. This is what I was doing:

MIX_TARGET=rpi0 MIX_ENV=prod mix firmware && ./upload.sh nerves-ext-rpi0.local

In my attempt to combine two steps into one I unfortunately created a problem. The upload script has rpi0 as default for MIX_TARGET, but dev for MIX_ENV. I didn't realize that the env vars weren't propagated to the upload script, only to the mix task. This only became apparent after I blew away the _build directory.

I am sorry for wasting your time and I really appreciate you both jumping in and trying to help me out.