Open brianmay opened 8 months ago
This still doesn't fix all the intermittent failures, but I now only see 1, not up to 7:
1) test resumes logging when starting a drive (TeslaMate.Vehicles.Vehicle.StreamingTest)
test/teslamate/vehicles/vehicle/streaming_test.exs:452
Assertion failed, no matching message after 300ms
The following variables were pinned:
car = %TeslaMate.Log.Car{__meta__: #Ecto.Schema.Metadata<:built, "cars">, id: 55174, name: nil, efficiency: nil, model: "3", trim_badging: nil, marketing_name: nil, exterior_color: nil, wheel_type: nil, spoiler_type: nil, eid: 0, vid: 1000, vin: "1000", settings_id: nil, settings: %TeslaMate.Settings.CarSettings{__meta__: #Ecto.Schema.Metadata<:built, "car_settings">, id: nil, suspend_min: 999999999, suspend_after_idle_min: 15, req_not_unlocked: true, free_supercharging: false, use_streaming_api: true, car: #Ecto.Association.NotLoaded<association :car is not loaded>}, charging_processes: #Ecto.Association.NotLoaded<association :charging_processes is not loaded>, positions: #Ecto.Association.NotLoaded<association :positions is not loaded>, drives: #Ecto.Association.NotLoaded<association :drives is not loaded>, inserted_at: nil, updated_at: nil}
The process mailbox is empty.
code: assert_receive {:start_drive, ^car}
stacktrace:
test/teslamate/vehicles/vehicle/streaming_test.exs:478: (test)
Just saw 2 more test failures.
I think the async flag needs to be turned off for all the tests in this directory.
diff --git a/test/teslamate/vehicles/vehicle/charging_test.exs b/test/teslamate/vehicles/vehicle/charging_test.exs
index 106d64df..26fb4f65 100644
--- a/test/teslamate/vehicles/vehicle/charging_test.exs
+++ b/test/teslamate/vehicles/vehicle/charging_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.ChargingTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
alias TeslaMate.Vehicles.Vehicle.Summary
alias TeslaMate.Log.ChargingProcess
diff --git a/test/teslamate/vehicles/vehicle/driving_test.exs b/test/teslamate/vehicles/vehicle/driving_test.exs
index 11ac512e..38078ccc 100644
--- a/test/teslamate/vehicles/vehicle/driving_test.exs
+++ b/test/teslamate/vehicles/vehicle/driving_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.DrivingTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
alias TeslaMate.Vehicles.Vehicle.Summary
alias TeslaMate.Log.Drive
diff --git a/test/teslamate/vehicles/vehicle/streaming_test.exs b/test/teslamate/vehicles/vehicle/streaming_test.exs
index 20f54723..c43db381 100644
--- a/test/teslamate/vehicles/vehicle/streaming_test.exs
+++ b/test/teslamate/vehicles/vehicle/streaming_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.StreamingTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
import ExUnit.CaptureLog
diff --git a/test/teslamate/vehicles/vehicle/suspend_logging_test.exs b/test/teslamate/vehicles/vehicle/suspend_logging_test.exs
index ae3d6677..f282debe 100644
--- a/test/teslamate/vehicles/vehicle/suspend_logging_test.exs
+++ b/test/teslamate/vehicles/vehicle/suspend_logging_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.SuspendLoggingTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
alias TeslaMate.Vehicles.Vehicle.Summary
alias TeslaMate.Vehicles.Vehicle
diff --git a/test/teslamate/vehicles/vehicle/suspend_test.exs b/test/teslamate/vehicles/vehicle/suspend_test.exs
index b89ee958..d8561f03 100644
--- a/test/teslamate/vehicles/vehicle/suspend_test.exs
+++ b/test/teslamate/vehicles/vehicle/suspend_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.SuspendTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
alias TeslaMate.Vehicles.Vehicle.Summary
diff --git a/test/teslamate/vehicles/vehicle/updating_test.exs b/test/teslamate/vehicles/vehicle/updating_test.exs
index d26cefb5..fc1a8fd6 100644
--- a/test/teslamate/vehicles/vehicle/updating_test.exs
+++ b/test/teslamate/vehicles/vehicle/updating_test.exs
@@ -1,5 +1,5 @@
defmodule TeslaMate.Vehicles.Vehicle.UpdatingTest do
- use TeslaMate.VehicleCase, async: true
+ use TeslaMate.VehicleCase, async: false
alias TeslaMate.Vehicles.Vehicle.Summary
Still getting the above error, but only rarely, but appears to be much better now.
Sometimes tests come up with the following message:
14:27:33.383 [error] Postgrex.Protocol (#PID<0.411.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.19679.0> exited
But this doesn't seem to matter.
Nice catch. Unreliable tests are a nightmare. I do see the asynchronous run approach to match reality therefore vote to turn asnyc off.
I pinned our CI to elixir-1.16.1-otp-26 as well and the tests run perfectly again, see #3733
But came back when running on master, so unreliable with async test run.
Yes, I upgraded my box to elixir 1.16, and I still get lots of test failures.
For some reason CI is more likely to have passing tests. Tests on my box never work without turning async off. Wonder if this is because the multi-tasking performance of my box is superior.
Except this one test, I sometimes get failures here still:
1) test resumes logging when starting a drive (TeslaMate.Vehicles.Vehicle.StreamingTest)
test/teslamate/vehicles/vehicle/streaming_test.exs:452
Assertion failed, no matching message after 300ms
The following variables were pinned:
car = %TeslaMate.Log.Car{__meta__: #Ecto.Schema.Metadata<:built, "cars">, id: 11376, name: nil, efficiency: nil, model: "3", trim_badging: nil, marketing_name: nil, exterior_color: nil, wheel_type: nil, spoiler_type: nil, eid: 0, vid: 1000, vin: "1000", settings_id: nil, settings: %TeslaMate.Settings.CarSettings{__meta__: #Ecto.Schema.Metadata<:built, "car_settings">, id: nil, suspend_min: 999999999, suspend_after_idle_min: 15, req_not_unlocked: true, free_supercharging: false, use_streaming_api: true, car: #Ecto.Association.NotLoaded<association :car is not loaded>}, charging_processes: #Ecto.Association.NotLoaded<association :charging_processes is not loaded>, positions: #Ecto.Association.NotLoaded<association :positions is not loaded>, drives: #Ecto.Association.NotLoaded<association :drives is not loaded>, inserted_at: nil, updated_at: nil}
The process mailbox is empty.
code: assert_receive {:start_drive, ^car}
stacktrace:
test/teslamate/vehicles/vehicle/streaming_test.exs:478: (test)
But it is always this one test, so wondering if that test is somehow broken or racey.
Wonder if that timeout is just too short. Would hope that 300ms is plenty. But will try to reproduce with a timeout of 1000ms.
Might be hard to verify, a successful run doesn't mean anything.
New theory: maybe tests are actually failing due to assert_receive timing out. This might explain "Assertion failed, no matching message after 300ms" message, but not the one "Unexpectedly received message" message.
But don't particularly want to update each and every assert_receive and refute_receive line by hand. If only we could set this globally. Oh, wait, we can.
ELIXIR_ASSERT_TIMEOUT=2000 mix test
But failures are persisting. Got 3-4 failures. And messages are showing new timeouts.
Just to be sure, I made the following change:
diff --git a/test/test_helper.exs b/test/test_helper.exs
index 58b9a493..3e6227c3 100644
--- a/test/test_helper.exs
+++ b/test/test_helper.exs
@@ -13,4 +13,4 @@ TeslaMate.Repo.start_link()
Phoenix.PubSub.Supervisor.start_link(name: TeslaMate.PubSub)
assert_timeout = String.to_integer(System.get_env("ELIXIR_ASSERT_TIMEOUT") || "300")
-ExUnit.start(assert_receive_timeout: assert_timeout)
+ExUnit.start(assert_receive_timeout: assert_timeout, refute_receive_timeout: assert_timeout)
As it bothered me that we set one, but not the other.
ELIXIR_ASSERT_TIMEOUT=2000 mix test
Now got 51+ test failures. Oops. I guess that was a bad change to make. But I think it does show that our tests are very timing sensitive, and perhaps this is not a good thing.
I don't think any of this helps, but documenting in case it gives anyone ideas.
I rewrote the above text when I realised that increasing refute_receive_timeout
increases the number of test failures.
But I think it does show that our tests are very timing sensitive, and perhaps this is not a good thing.
Total agree. Before looking deeper into it, I always thought our tests were great.
To be fair, writing effective and efficient tests when you have lots of processes interacting like this is... hard.
Not really what is considered good practise in fact. I just know how to criticise bad practise :-)
Still getting some errors. I think a lot less though then before.
This one I just saw is weird: https://github.com/teslamate-org/teslamate/actions/runs/8288206183/job/22682287560?pr=3485
** (UndefinedFunctionError) function System.convert_time_unit/3 is undefined (module System is not available)
(elixir 1.16.1) System.convert_time_unit(-576460[71](https://github.com/teslamate-org/teslamate/actions/runs/8288206183/job/22682287560?pr=3485#step:10:72)9104225362, :native, :millisecond)
(db_connection 2.6.0) lib/db_connection/connection_pool.ex:357: DBConnection.ConnectionPool.start_idle/2
** (EXIT from #PID<0.99.0>) shutdown
(db_connection 2.6.0) lib/db_connection/connection_pool.ex:205: DBConnection.ConnectionPool.drop_idle/6
(stdlib 5.2.1) gen_server.erl:1095: :gen_server.try_handle_info/3
(stdlib 5.2.1) gen_server.erl:1183: :gen_server.handle_msg/6
(stdlib 5.2.1) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
Last message: {:timeout, #Reference<0.821106270.461373444.45001>, -576460[72](https://github.com/teslamate-org/teslamate/actions/runs/8288206183/job/22682287560?pr=3485#step:10:73)0104900933}
Either the function exists or it doesn't exist. It can't just not exist occasionally...
As expected, rebuild solved that.
In case it is not clear, I am still seeing one test failure. Occasionally. Not easy to reproduce on demand however.
On CI I do not see any failures
I think I saw this failure recently on CI. But can't find it right now.
This is the random failure:
1) test resumes logging when starting a drive (TeslaMate.Vehicles.Vehicle.StreamingTest)
test/teslamate/vehicles/vehicle/streaming_test.exs:452
Assertion failed, no matching message after 300ms
The following variables were pinned:
car = %TeslaMate.Log.Car{__meta__: #Ecto.Schema.Metadata<:built, "cars">, id: 49656, name: nil, efficiency: nil, model: "3", trim_badging: nil, marketing_name: nil, exterior_color: nil, wheel_type: nil, spoiler_type: nil, eid: 0, vid: 1000, vin: "1000", settings_id: nil, settings: %TeslaMate.Settings.CarSettings{__meta__: #Ecto.Schema.Metadata<:built, "car_settings">, id: nil, suspend_min: 999999999, suspend_after_idle_min: 15, req_not_unlocked: true, free_supercharging: false, use_streaming_api: true, car: #Ecto.Association.NotLoaded<association :car is not loaded>}, charging_processes: #Ecto.Association.NotLoaded<association :charging_processes is not loaded>, positions: #Ecto.Association.NotLoaded<association :positions is not loaded>, drives: #Ecto.Association.NotLoaded<association :drives is not loaded>, inserted_at: nil, updated_at: nil}
The process mailbox is empty.
code: assert_receive {:start_drive, ^car}
stacktrace:
test/teslamate/vehicles/vehicle/streaming_test.exs:478: (test)
CI got test failure as well here: https://github.com/teslamate-org/teslamate/actions/runs/8508853689/job/23303206117?pr=3800
Is there an existing issue for this?
What happened?
Noticed lots of test failures when running tests locally. The found I have to start tests with clean database. Oops.
But even then, was getting errors - up to 7. Then I found if I ran test files individually, was more like to work.
At a hunch I made the following change:
Now all the tests pass for me. I suspect we have race conditions with multiple tests interfering with the database at the same time. Yuck.
Not sure I like this, but I think as our tests access the database they are more like integration tests not unit tests, and as a result they really do need to be run sequentially.
Expected Behavior
Tests pass without failures.
Steps To Reproduce
MIX_ENV=test mix test
Relevant log output
Screenshots
No response
Additional data
No response
Type of installation
Docker
Version
master