Open ysbaddaden opened 1 year ago
I'm reproducing on my local: I have a Verboice channel configured in Surveda on my local environment and I configured Guisso for everything.
If I try to start Surveda before I start Guisso (i.e. which happens all the time, because I always forget to start Guisso), then Surveda crashes because it can't get a valid token on behalf of Verboice. Once I start Guisso, then Surveda won't crash anymore on start.
Verboice on the other hand doesn't have any issue: it simply reports the channel as Unauthorized.
What type of channel do you have configured? I setup an smpp channel and have tried many permutations of logging in, logging out, and loading pages with guisso not running and guisso running and I haven't got anything to crash so far. I mostly get redirected to a guisso url that doesn't load when guisso is down but surveda hasn't crashed from that.
@ysbaddaden will provide the steps to reproduce the crash.
I'll try to reproduce more exactly. It was either the QST simulator (SMS) or the Twilio simulator.
What's breaking is the internal OAuth tokens for channels, not the OAuth web flow for connecting the UI to Surveda.
Surveda is requesting guisso for an oauth token for talking with verboice or nuntium, then will regularly refresh it after some minutes (e.g. when restarting surveda after some time).
@nthiad I don't know if the latest comments by Julien, help you to reproduce the issue.
I've tried more things and I've been working on other patches but I still haven't been able to reproduce this issue either intentionally or by working on other tickets.
For future reference: I forgot to start Guisso on my local, and Surveda won't start properly. Here's the stacktrace:
surveda-app-1 | Compiling 188 files (.ex)
surveda-app-1 | Generated ask app
surveda-app-1 | 2023-05-05T10:10:09.172Z [info] SurveyBroker started. Default batch size=10000. Limit per minute=100.
surveda-app-1 | 2023-05-05T10:10:09.173Z [info] FLOIP pusher: started.
surveda-app-1 | 2023-05-05T10:10:09.222Z [info] ChannelStatusServer: started.
surveda-app-1 | 2023-05-05T10:10:09.223Z [info] QuestionnaireSimulator started with simulation_ttl: 5
surveda-app-1 | 2023-05-05T10:10:09.299Z [info] Running AskWeb.Endpoint with cowboy 2.9.0 at 0.0.0.0:80 (http)
surveda-app-1 | 2023-05-05T10:10:09.300Z [info] Access AskWeb.Endpoint at http://app.surveda.lvh.me
surveda-app-1 | 2023-05-05T10:10:09.302Z [info] Running Ask.MetricsEndpoint with cowboy 2.9.0 at 0.0.0.0:9980 (http)
surveda-app-1 | 2023-05-05T10:10:09.303Z [info] Access Ask.MetricsEndpoint at http://localhost:9980
surveda-app-1 | 2023-05-05T10:10:09.303Z [info] Start cancelling survey (id: )
surveda-app-1 | 2023-05-05T10:10:10.184Z [info] FLOIP pusher: Scheduling next run in 15 minutes
surveda-app-1 | 2023-05-05T10:10:10.298Z request_id=F1w3hjXYQh6NPIwAAEVB [info] POST /oauth2/token
surveda-app-1 | 2023-05-05T10:10:10.312Z [error] GenServer Ask.OAuthTokenServer terminating
surveda-app-1 | ** (FunctionClauseError) no function clause matching in DateTime.from_unix/3
surveda-app-1 | (elixir 1.10.4) lib/calendar/datetime.ex:122: DateTime.from_unix(nil, :second, Calendar.ISO)
surveda-app-1 | (elixir 1.10.4) lib/calendar/datetime.ex:174: DateTime.from_unix!/3
surveda-app-1 | (ask 0.31.0) lib/ask/o_auth_token.ex:24: Ask.OAuthToken.from_access_token/2
surveda-app-1 | (ask 0.31.0) lib/ask/o_auth_token_server.ex:44: Ask.OAuthTokenServer.refresh/3
surveda-app-1 | (ask 0.31.0) lib/ask/o_auth_token_server.ex:26: Ask.OAuthTokenServer.handle_call/3
surveda-app-1 | (stdlib 3.12.1.2) gen_server.erl:661: :gen_server.try_handle_call/4
surveda-app-1 | (stdlib 3.12.1.2) gen_server.erl:690: :gen_server.handle_msg/6
surveda-app-1 | (stdlib 3.12.1.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
surveda-app-1 | Last message (from #PID<0.1843.0>): {:get_token, "verboice", "http://web.verboice.lvh.me", 1}
surveda-app-1 | State: []
surveda-app-1 | Client #PID<0.1843.0> is alive
surveda-app-1 |
surveda-app-1 | (stdlib 3.12.1.2) gen.erl:167: :gen.do_call/4
surveda-app-1 | (elixir 1.10.4) lib/gen_server.ex:1020: GenServer.call/3
surveda-app-1 | (ask 0.31.0) lib/ask/runtime/verboice_channel.ex:145: Ask.Runtime.VerboiceChannel.create_client/2
surveda-app-1 | (ask 0.31.0) lib/ask/runtime/verboice_channel.ex:23: Ask.Runtime.VerboiceChannel.new/1
surveda-app-1 | (ask 0.31.0) lib/ask/runtime/session.ex:97: Ask.Runtime.Session.timeout/1
surveda-app-1 | (ask 0.31.0) lib/ask/runtime/survey_broker.ex:79: anonymous fn/1 in Ask.Runtime.SurveyBroker.retry_respondent/1
surveda-app-1 | (ecto_sql 3.9.2) lib/ecto/adapters/sql.ex:1203: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
surveda-app-1 | (db_connection 2.4.3) lib/db_connection.ex:1611: DBConnection.run_transaction/4
surveda-app-1 | lib/mut.ex:262: Mutex.apply_with_lock/3
surveda-app-1 | (elixir 1.10.4) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
surveda-app-1 | (elixir 1.10.4) lib/enum.ex:783: Enum.each/2
surveda-app-1 | (elixir 1.10.4) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
surveda-app-1 | (elixir 1.10.4) lib/enum.ex:783: Enum.each/2
surveda-app-1 | (ask 0.31.0) lib/ask/runtime/survey_broker.ex:51: Ask.Runtime.SurveyBroker.handle_info/3
surveda-app-1 | (stdlib 3.12.1.2) gen_server.erl:637: :gen_server.try_dispatch/4
surveda-app-1 | (stdlib 3.12.1.2) gen_server.erl:711: :gen_server.handle_msg/6
surveda-app-1 | (stdlib 3.12.1.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
The Survey Broker needs to communicate with verboice (to cancel some calls if I'm reading this correctly). For that it needs to refresh the access token from Guisso, but Guisso isn't started so it fails.
In my log, I have a second attempt, and then nothing. The Erlang process doesn't exit, but I can't reach Surveda from the Web. It's left in a zombie-like state.
When Guisso returns an invalid token (e.g.
expires_at
isnil
instead of an UNIX timestamp), the Broker crashes, taking down the whole Surveda application altogether.Surveda should be resilient to such issues. In the worst case, only the oauth
gen_server
should crash yet be restarted by its supervisor (and not crash the whole app). At best, failing to refresh an oauth token should be taken care of: consider the channel down and try again later (probably)? delete the token?Related to https://github.com/instedd/guisso/issues/83