Use pytest to populate parameterized test instances

OddBloke commented 1 year ago

(This is very rough, so I'm opening it as draft: this is the first Lua I've written that wasn't neovim configuration, so it will need substantial rework before it can land!)

These commits modify PythonNeotestAdapter.discover_positions(path) to:

call pytest --collect-only <path> (via neotest_python.py)
parse pytest's output to determine any test instances ([a-b-c])
iterate over the tests collected by the existing TS query and add child nodes for each discovered test instance

This would fix #35

It also includes one modification to get the UX to work correctly: individual test instances aren't tied to locations in pytest, as they can be generated by the cross-product of several decorators and/or fixtures. It follows that the cursor position should still select the parent test, and not the test instances. As far as I can tell, this isn't currently achievable in neotest: setting range = nil on the position results in all sorts of errors, and leaving it the same as the parent test results in the last test instance being focused.

With this hack to neotest core, and 602a0ac1c9a92c5b498b9af4950272796e38605a, positions can declare that they should be excluded from focus (a better variable name would not go amiss, but it's getting late!):

--- a/lua/neotest/client/init.lua
+++ b/lua/neotest/client/init.lua
@@ -123,7 +123,9 @@ function NeotestClient:get_nearest(file_path, row, args)
   for _, pos in positions:iter_nodes() do
     local data = pos:data()
     if data.range and data.range[1] <= row then
-      nearest = pos
+      if not data.should_skip then
+          nearest = pos
+      end
     else
       return nearest, adapter_id
     end

This results in the parameters being displayed in the summary, while the parent test is selected: neotest-param-1

neotest.run.run in a test definition will run all test instances: neotest-param-2

And you can selectively run individual test instances from the summary window (the highlighting of the test name is from another plugin, oopsie): neotest-param-3

OddBloke commented 1 year ago

One outstanding issue: the parent test in the summary window will have its result based on the most recent run: if only a passing set of instances were run, it will show green, even if there are failing instances from previous runs.

OddBloke commented 1 year ago

With this hack to neotest core, and 602a0ac, positions can declare that they should be excluded from focus (a better variable name would not go amiss, but it's getting late!):

--- a/lua/neotest/client/init.lua
+++ b/lua/neotest/client/init.lua
@@ -123,7 +123,9 @@ function NeotestClient:get_nearest(file_path, row, args)
   for _, pos in positions:iter_nodes() do
     local data = pos:data()
     if data.range and data.range[1] <= row then
-      nearest = pos
+      if not data.should_skip then
+          nearest = pos
+      end
     else
       return nearest, adapter_id
     end

I think a better non-hacky solution here would be for neotest core to support tests without a location: if range is nil, then traverse up the tree until a range is found, and use that. (I have a partial implementation of a single-parent version of this locally, so I think it's feasible.)

OddBloke commented 1 year ago

I think a better non-hacky solution here would be for neotest core to support tests without a location: if range is nil, then traverse up the tree until a range is found, and use that. (I have a partial implementation of a single-parent version of this locally, so I think it's feasible.)

Here's a rough go at a more generic version of that (there are a bunch of other places that would also need to use this, this is just illustrative!):

--- a/lua/neotest/client/init.lua
+++ b/lua/neotest/client/init.lua
@@ -122,6 +122,10 @@ function NeotestClient:get_nearest(file_path, row, args)
   local nearest
   for _, pos in positions:iter_nodes() do
     local data = pos:data()
+    if not data.range then
+      pos = pos:closest_parent_with("range")
+      data = pos:data()
+    end
     if data.range and data.range[1] <= row then
       nearest = pos
     else
diff --git a/lua/neotest/types/tree.lua b/lua/neotest/types/tree.lua
index b0ae7af..d2dd470 100644
--- a/lua/neotest/types/tree.lua
+++ b/lua/neotest/types/tree.lua
@@ -144,6 +144,14 @@ function Tree:iter_parents()
   end
 end

+function Tree:closest_parent_with(data_attr)
+  for parent in self:iter_parents() do
+    if parent:data()[data_attr] then
+      return parent
+    end
+  end
+end
+
 ---@return neotest.Tree
 function Tree:root()
   local node = self

OddBloke commented 1 year ago

@rcarriga What do you think to this approach, of shelling out to pytest to get the test instances? If you're happy with it at a high-level, I can work through this code to make it a bit more robust.

(No rush on a response, I totally understand that you have other things going on in your life!)

rcarriga commented 1 year ago

I think this approach makes total sense, I hadn't thought of the complexities of parameters in pytest.

I also like making the ranges optional, it's an elegant way of saying that a test doesn't exist statically :+1:

When I initially started writing this plugin, I was actually going to use pytest for discovery as that's what VSCode does, but I found it to be crazy slow for some projects and unreliable in that if there was error during discovery, the whole thing would fall over and nothing would be found. Your solution here solves the latter, since only parameters would be missing if it failed, which is great!

The performance is still a concern for me. I think there are some things we can do to address it though. Some ideas:

Only run pytest discovery if parameters are discovered, no need to run it if there are none. That way people without them won't suffer the hit.
We construct the positions in the pytest plugin, serialise them to JSON and so then in lua we just parse them and tack them on. We could do this in a build_positions function passed to treesitter.parse_positions before the tree is constructed as then it's just nested lists which would suit perfectly. This is not really for the performance but more because we're already integrated with pytest so might as well leverage that. Also the tree is not at all designed to be mutated after the initial build so that avoids that issue entirely. We could write the positions to a file to pass them but then we have to write to disk which will likely also affect performance (testing needed :sweat_smile:). Alternatively we could use stdout or stderr, if we can stop pytest using them. Not sure if that's possible/easy to do.
(This is a bit of a stretch) We start a single, long running process to do the parsing rather than a process for each file. This would be pretty cool but I'd imagine also very complex, more research would be needed and I wouldn't want to impose that requirement for now.

This is just me spitballing right now though! I'd be interested to see performance as is.

I'll have a test myself when I get a chance but if you're interested, you can run the benchmark consumer (hopefully, I don't think anyone else has ever used it) which will tell you how long it took to parse a project. This is really only useful on large projects, I like to use the cpython repo.

nvim --headless -c 'lua require("neotest").benchmark.start()'

OddBloke commented 1 year ago

I think this approach makes total sense, I hadn't thought of the complexities of parameters in pytest.

Excellent!

I also like making the ranges optional, it's an elegant way of saying that a test doesn't exist statically +1

I've actually had second thoughts on this: I've opened https://github.com/nvim-neotest/neotest/issues/147 and will comment more over there in a minute.

The performance is still a concern for me. I think there are some things we can do to address it though. Some ideas:

Agreed, the performance is definitely a problem at the moment!

1. Only run pytest discovery if parameters are discovered, no need to run it if there are none. That way people without them won't suffer the hit.

Yep, though I think I'd like to make this configurable: pytest fixtures can be parameterised, and do not have to be defined in the same test file (or even Python module) as the tests themselves, so we can't reliably determine even the existence of instances via TS. I think three states (names invented right now and subject to change) make sense:

off: never invoke pytest for instances
heuristic: if we find a pytest.mark.parametrize decorator, invoke pytest for instances: this is what you're describing and would be the default
on: always invoke pytest for instances

2. We construct the positions in the pytest plugin, serialise them to JSON and so then in lua we just parse them and tack them on. We could do this in a `build_positions` function passed to `treesitter.parse_positions` before the tree is constructed as then it's just nested lists which would suit perfectly. This is not really for the performance but more because we're already integrated with pytest so might as well leverage that. Also the tree is not at all designed to be mutated after the initial build so that avoids that issue entirely. We could write the positions to a file to pass them but then we have to write to disk which will likely also affect performance (testing needed sweat_smile). Alternatively we could use stdout or stderr, if we can stop pytest using them. Not sure if that's possible/easy to do.

I definitely like the idea of shifting more of this over into Python (and not only because I know Python much better than Lua :sweat_smile:).

However, instead of integrating those positions into the tree earlier, I was thinking that a good way of addressing performance issues might be to introduce two-phase position discovery: neotest core would call discover_positions, render the positions returned by that, then invoke an optional second plugin method (supplement _positions?) which could add positions discovered via slower means (e.g. pytest). This would allow the initial test structure to be displayed and ready for use sooner: this is particularly apropos for the pytest instances case, because once we address https://github.com/nvim-neotest/neotest/issues/147, the initial test structure contains the only tests which pytest.run.run should select: it's only result processing and the summary view which need to know about test instances.

The code in this PR would be modified so that discover_positions would (a) kick off the pytest process (if appropriate), (b) run the TS query and return the generated tree. supplement_positions would wait for the pytest process to complete, and return the merged tree.

Does that make any sense with how the core/rendering is currently structured?

3. (This is a bit of a stretch) We start a single, long running process to do the parsing rather than a process for each file. This would be pretty cool but I'd imagine also very complex, more research would be needed and I wouldn't want to impose that requirement for now.

Yeah, this would be very cool! I reached out to #pytest on IRC, and they don't know of any existing project which does something like that: an interesting area for research!

This is just me spitballing right now though! I'd be interested to see performance as is.

I'll have a test myself when I get a chance but if you're interested, you can run the benchmark consumer (hopefully, I don't think anyone else has ever used it) which will tell you how long it took to parse a project. This is really only useful on large projects, I like to use the cpython repo.
nvim --headless -c 'lua require("neotest").benchmark.start()'

Well..... running in the CPython repo kinda forkbombs my machine as currently implemented :sweat_smile:. I've added code to only invoke pytest if we're using the "pytest" runner, so startup time with pytest installed is 60s (vs. 5-6s without pytest installed). Once the heuristic is implemented, though, we should return to closer to the non-pytest values: CPython's test codebase doesn't have parameterized tests. In the pytest codebase, which does have some parametrized tests, it's ~21s vs ~1s though, again, a lot of their test files don't actually use parameterization so that comparison will improve with the heuristic.

Currently, we invoke pytest per-test-file: there are 742 test_* files in the CPython repo, so we invoke pytst 742 times(!). If we batched our executions up somehow, we could do a lot better: running pytest --collect-only -q from the root of the CPython repo takes ~10s: that's effectively the floor for our performance when checking every test file, so adding on the ~5s that neotest normally takes, we're about 45s slower than we could be.

OddBloke commented 1 year ago

I've just pushed up a change which will only run pytest to gather test instances if it finds a parametrize decorator: to save copy/pasting querying code, this depends on https://github.com/nvim-neotest/neotest/pull/149 which refactors things to make reusing the generic querying code possible.

With this change, I'm seeing CPython benchmark times with pytest drop to ~5s, as we would expect. pytest (the codebase) benchmark times are ~5s: a grep suggests that there are 41 (of 108) test_*.py files with parametrize.

rcarriga commented 1 year ago

Does that make any sense with how the core/rendering is currently structured?

Instead of adding a new adapter method for this, neotest core could accept an iterator as the return value of discover_positions. The first value returned would be the treesitter parsed one and then the next would be the result of pytest parsing. This would also work well if went down the batched parsing route, as the second values can just wait for the batched parsing to finish (of course using iterators would allow many updates but don't think that matters here). Not sure of the best way to flag that a batch should be parsed initially but I'm sure we could figure something out that does the job.

Well..... running in the CPython repo kinda forkbombs my machine as currently implemented sweat_smile.

Fantastic :laughing:

With this change, I'm seeing CPython benchmark times with pytest drop to ~5s, as we would expect. pytest (the codebase) benchmark times are ~5s: a grep suggests that there are 41 (of 108) test_*.py files with parametrize.

Brilliant, not impacting people that don't need this is my top concern, once we can allow users to not be affected by performance problems, it becomes much easier to do this iteratively.

OddBloke commented 1 year ago

Does that make any sense with how the core/rendering is currently structured?

Instead of adding a new adapter method for this, neotest core could accept an iterator as the return value of discover_positions. The first value returned would be the treesitter parsed one and then the next would be the result of pytest parsing. This would also work well if went down the batched parsing route, as the second values can just wait for the batched parsing to finish (of course using iterators would allow many updates but don't think that matters here). Not sure of the best way to flag that a batch should be parsed initially but I'm sure we could figure something out that does the job.

I'm experimenting with this now:

--- a/lua/neotest/client/init.lua
+++ b/lua/neotest/client/init.lua
@@ -261,13 +261,19 @@ function NeotestClient:_update_positions(path, args)
       self:_parse_files(adapter_id, path, files)
     else
       logger.info("Parsing", path)
-      local positions = adapter.discover_positions(path)
-      if not positions then
+      local found = false
+      for positions in adapter.discover_positions(path) do
+        if positions then
+          found = true
+          logger.debug("Found", positions)
+          logger.info("FOUND", path)
+          self._state:update_positions(adapter_id, positions)
+          logger.info("POST-UPDATE", path)
+        end
+      end
+      if not found then
         logger.info("No positions found in", path)
-        return
       end
-      logger.debug("Found", positions)
-      self._state:update_positions(adapter_id, positions)
     end
   end)
   if not success then

With corresponding local neotest-python changes to (a) correctly return an iterator with two steps, and (b) sleep 5s in the pytest execution to make it easier to see if we're getting two-phase updates. I see multiple calls to update_positions, with an expected 5s gap, but the tree doesn't appear updated until after both sets of positions have been found and processed (i.e. after the 5s). Specifically, it seems like the tree gets updated at the same time that the "Initialisation finished" message is emitted.

After a bit more experimentation, adding self._state:update_focused_file(adapter_id, async.fn.expand("%:p")) to the body of the above for-loop (which in the current code happens after all adapters have completed discovery: this change is just for testing) and reducing the async.util.sleep call in the async summary rendering code to 1ms (or removing it altogether) got the desirable behaviour: the tree was updated two times, first without instances and then with. So I think there are a couple of issues: (a) we need to focus the file earlier than we do currently, and (b) figure out why that 100ms sleep seems to take much longer than 100ms (https://github.com/neovim/neovim/issues/10018 might be related?)

rcarriga commented 1 year ago

I was thinking of instead of waiting for all of the parsing to complete before returning from _update_positions we just send of an async function to go through the rest of the updates. Something like this:

      local positions_iter = adapter.discover_positions(path)
      local positions = positions_iter()

      if not positions then
        logger.info("No positions found in", path)
        return
      end
      logger.debug("Found", positions)
      self._state:update_positions(adapter_id, positions)
      async.run(function()
        local positions = positions_iter()
        while positions do
          self._state:update_positions(adapter_id, positions)
          positions = positions_iter()
        end
      end)

We might need some sort of way of stopping the async function so that it doesn't overwrite newer results

OddBloke commented 1 year ago

Brilliant, not impacting people that don't need this is my top concern, once we can allow users to not be affected by performance problems, it becomes much easier to do this iteratively.

In this spirit: I've just pushed up removal of logging lines, fixes for indentation, and a config option to enable this behaviour (defaulting to disabled): unless they opt in, users should see no changes in behaviour or performance. I think, once I teach neotest core how to handle range = nil properly, this should be ready to land. Is there anything else you'd like to see before it can land?

(I'll happily continue experimenting with a result iterator: landing this work will make it easier to review what I'm changing for that!)

rcarriga commented 1 year ago

I'm happy enough with this going in once we've got it disabled by default :+1: My main ask would be to just put all of this new code into a separate module and just expose what's needed. The init.lua is already a mess so I'd like to avoid adding more to it :sweat_smile: Been meaning to clean it up to remove all the module level state

OddBloke commented 1 year ago

I'm happy enough with this going in once we've got it disabled by default :+1: My main ask would be to just put all of this new code into a separate module and just expose what's needed. The init.lua is already a mess so I'd like to avoid adding more to it :sweat_smile: Been meaning to clean it up to remove all the module level state

Just pushed this change up! This is using module-level state (in the new module), so if you'd prefer to handle that differently, let me know what you'd like to see.

rcarriga commented 1 year ago

That's fine, it's relatively simple so it'll be easy to remove :+1: I'll give a review when I get a chance :smile:

rcarriga commented 1 year ago

Hey @OddBloke just wanted to check if you're still interested in continuing this PR :smile: It'd be great to get this functionality in so if you're unable, I'd be happy to take it on.

urmzd commented 8 months ago

@rcarriga I too would love to have this functionality in -- if needed, I can continue where @OddBloke left off (if they're still busy).

OddBloke commented 8 months ago

Hey, this ping was timed well: this is a hack week at work, so let me see if I can get my head back into this and push it over the line!

OddBloke commented 8 months ago

I've merged @rcarriga's suggested changes PR and done some manual testing and, with https://github.com/nvim-neotest/neotest/pull/304, this seems to be working well!

There is still something funky going on with status markers: only the latest defined of the child tests which have run updates the marker. So if you run the entire set of children, the last result will be the only one reflected: if you then run any child but the last one, the status marker does not even update to reflect that the test is running. If you run individual children (from the summary window), then the status marker is updated iff you haven't run any later-listed children.

OddBloke commented 8 months ago

(That doesn't feel like it necessarily need to block landing this, so I'm marking this ready for review.)

OddBloke commented 8 months ago

I've filed https://github.com/nvim-neotest/neotest/issues/305 for the aforementioned funkiness.

rcarriga commented 8 months ago

Sorry I haven't had much time to work on personal projects so haven't had a chance to look at this. Thank you for picking this back up and the additional fixes! This all looks great and works perfectly AFAICT so time to merge!

duncanam commented 8 months ago

@rcarriga Hm, this change seems to be showing all Python parametrize as failed now :(

tku137 commented 7 months ago

@rcarriga Hm, this change seems to be showing all Python parametrize as failed now :(

Can confirm. Parametrized tests run perfectly, but are shown as failed.

duncanam commented 7 months ago

@rcarriga Hm, this change seems to be showing all Python parametrize as failed now :(

Can confirm. Parametrized tests run perfectly, but are shown as failed.

This fixed it for me:

pytest_discover_instances = true

There's a snippet on this at the bottom of the readme on this repo. The downside is testing is not as snappy.

tku137 commented 7 months ago

@rcarriga Hm, this change seems to be showing all Python parametrize as failed now :(

Can confirm. Parametrized tests run perfectly, but are shown as failed.

This fixed it for me:

pytest_discover_instances = true

There's a snippet on this at the bottom of the readme on this repo. The downside is testing is not as snappy.

Thanks for the hint! Right in plain sight, sorry for not seeing this obvious fix :)

rcarriga commented 7 months ago

There was a bug that has now been fixed so you can set pytest_discover_instances = false and parameterized tests will show as failed only when one of them actually failed

nvim-neotest / neotest-python

Use pytest to populate parameterized test instances #36