Netman does not warn about missing docker installation

darrott commented 1 year ago

Tried same config on a Linux machine with docker installed and Neovim with Netman (1.1) installed opened instantly. On my Mac (Air M2) it took like 5 to 6 seconds to open. Thanks to logs.txt I found out that Netman was trying to find docker unsuccessfully, but when Neovim finished loading no error/warn was given.

Platform: MacOS Netman version: 1.1 Neotree installed and configured to work with Netman

miversen33 commented 1 year ago

To clarify here, what took 5-6 seconds to open on your mac? Was it Neovim itself? Or the Netman neotree extension?

I need to pin down where in the process netman hangs on this.

darrott commented 1 year ago

Okay well, more details! With the main branch version of Netman had no problems. With the 1.1 version with Neotree extension had to wait 5 to 6 seconds to have it ready to go. When I discovered that Netman was searching for docker in the logs, I installed docker with brew and after that I got Neovim to open instantly. I don't know what is faulty, I just know that the actual release doesn't stop for docker availability, which v1.1 does with active neotree extension.

No error or warn about missing docker was printed in someway on Neovim with main branch and 1.1 version.

luxus commented 1 year ago

i can confirm the issue, as soon I try to add netman to neo-tree I get a freeze as soon I try to open neo-tree. (yes I don't have docker installed as well)

miversen33 commented 1 year ago

That is interesting. When you open Neotree, the neotree driver reaches out to the API and fetches all available providers (seen here)

I will investigate this a bit more to see if I can figure out why the API is returning docker when docker shouldn't be valid (in your case).

Thank you both of you :)

miversen33 commented 1 year ago

Ok so good news and more good news. I can recreate this issue using the below minimal configuration

-- Minimal configuration
-- mini.lua
require("packer").startup(function(use)
  use "wbthomason/packer.nvim"
  use {
  "nvim-neo-tree/neo-tree.nvim",
    branch = "v2.x",
    requires = { 
      "nvim-lua/plenary.nvim",
      "nvim-tree/nvim-web-devicons", -- not strictly required, but recommended
      "MunifTanjim/nui.nvim",
    }
  }
  use {
      "miversen33/netman.nvim",
      branch = "v1.1"
  }
end)

vim.g.netman_log_level = 0

require("neo-tree").setup({
    sources = {
        "netman.ui.neo-tree"
    }
})

Also attaching the logs for this. After a quick peek in, you can see the following lines in the logs

2023-01-04 17:00:34] [SID: bvvgkqsenvarhfn] [Level: TRACE]  -- ...packer/start/netman.nvim/lua/netman/providers/docker.lua:nil:1799    {
  cmd_pieces = { "docker", "-v" },
  command = "docker -v",
  opts = {
    STDERR_JOIN = "",
    STDOUT_JOIN = ""
  },
  stderr = { "JOB TIMEOUT" },
  stdout = {}
}
[2023-01-04 17:00:34] [SID: bvvgkqsenvarhfn] [Level: WARN]   -- ...packer/start/netman.nvim/lua/netman/providers/docker.lua:nil:1802    Unable to verify docker is available to run     {
  stderr = { "JOB TIMEOUT" }
}
[2023-01-04 17:00:34] [SID: bvvgkqsenvarhfn] [Level: WARN]   -- ...im/site/pack/packer/start/netman.nvim/lua/netman/api.lua:load_provider:849   netman.providers.docker:0.2 refused to initialize. Discarding   false
[2023-01-04 17:00:34] [SID: bvvgkqsenvarhfn] [Level: INFO]   -- ...im/site/pack/packer/start/netman.nvim/lua/netman/api.lua:unload_provider:751 Attempting to unload provider: netman.providers.docker
[2023-01-04 17:00:34] [SID: bvvgkqsenvarhfn] [Level: INFO]   -- ...im/site/pack/packer/start/netman.nvim/lua/netman/api.lua:unload_provider:759 Disassociating Protocol Patterns and Autocommands with provider: netman.providers.docker

So everything is "working" as expected (docker indeed is failing to be found), and when you open neotree after the eternity it takes Netman to fail, you indeed do not see docker listed as a provider.

The main issue it seems is that instead of docker -v failing with a command not found, it just hangs until the command reaches a timeout state which is then considered a failure. Obviously not great, but at least that tells me where the issue is happening.

logs.txt

luxus commented 1 year ago

can you maybe check if docker socket is connectable or something? a btw. podman support would be nice too :D (it's a drop-in replacement)

miversen33 commented 1 year ago

@luxus

I am (basically) doing that, albeit via docker -v. I actually have an issue open for researching using the docker socket instead of using cli (#65). I am certain that would be much faster than parsing and talking over cli. I am however not super experienced with using the socket for docker, so it is not something I will be doing in v1.1. I think I have this pinned in backlog, though it will probably drop into v1.2.

I thought podman has an alias for docker so I shouldn't have to do anything at all. If I am wrong, drop some doc into a new issue and I can check it out, but IIRC podman creates a docker command alias so you literally don't have to do anything and can use docker while actually using podman.

miversen33 commented 1 year ago

Ok so I figured out what is going on here.

The lua luv documentation states that spawn is supposed to return the following 2 items

uv_process_t userdata, integer

As such, I grab both of these in the shell wrapper after calling spawn and then add some safeguards against jobs running forever when they aren't supposed to. There is no check to ensure that we actually get a valid pid back.

So where are you going with this miversen33?

If you check the docker provider init code, you will see that we are actually looking for non 0 exit code to verify that docker is not installed. IE, an exit code of 0 means docker is available. That logic is actually working correctly, the issue is actually a bit upstream. Because I am never checking to see if the PID returned by uv.spawn is valid, shell just waits until the job's on exit function. This is always called when a job is finished, regardless of how it was finished. In this case, the job is finished when the timeout is reached, at which point we gracefully kill the job.

Except the job never started in the first place

This can be proven with the following bit of simple code to verify

local SHELL = require("netman.tools.shell")
local command = {"garbage_command_that_doesnt_exist"}
local shell = SHELL:new(command)
log({output = shell:run(200), pid = shell._pid})

This outputs the following log

{
  output = {
    cmd_pieces = { "garbage_command_that_doesnt_exist" },
    command = "garbage_command_that_doesnt_exist",
    opts = {},
    stderr = { "JOB TIMEOUT" },
    stdout = {}
  },
  pid = "ENOENT: no such file or directory"
}

Notice, pid is not valid here. So the fix should be simple enough, lets verify that the pid is valid instead of assuming it is. I will look to get a patch into v1.1 sometime this evening as this is a very big problem

miversen33 commented 1 year ago

Tagging @darrott and @luxus Can you both try out the 113-netman-does-not-warn-about-missing-docker-installation branch and see if you experience the same issues with docker being weird when launching neovim?

darrott commented 1 year ago

I tried the new branch you put up there on my WSL Ubuntu with no docker installed. Same configuration (Packer, Netman, Neotree). It opens up fast, problem solved. I now paste down there the logs.txt, few lines declaring the unexistence of docker on the machine (also which docker doesn't print nothing).

[2023-01-04 21:49:19] [SID: tgfjnnpcaulhdhj] [Level: WARN]   -- ...packer/start/netman.nvim/lua/netman/providers/docker.lua:nil:1802    Unable to verify docker is available to run     {
  stderr = { "MISSING JOB HANDLE" }
}
[2023-01-04 21:49:19] [SID: tgfjnnpcaulhdhj] [Level: WARN]   -- ...im/site/pack/packer/start/netman.nvim/lua/netman/api.lua:load_provider:849   netman.providers.docker:0.2 refused to initialize. Discarding   false
[2023-01-04 21:49:24] [SID: dqxyqeqmkblpzho] [Level: WARN]   -- ...packer/start/netman.nvim/lua/netman/providers/docker.lua:nil:1802    Unable to verify docker is available to run     {
  stderr = { "MISSING JOB HANDLE" }
}
[2023-01-04 21:49:24] [SID: dqxyqeqmkblpzho] [Level: WARN]   -- ...im/site/pack/packer/start/netman.nvim/lua/netman/api.lua:load_provider:849   netman.providers.docker:0.2 refused to initialize. Discarding   false

miversen33 commented 1 year ago

Its interesting that the log appears to check that twice within about 5 seconds, that seems wrong. However, I will consider that "ok" for the time being. I will also clean up the docker code a bit more to accept MISSING JOB HANDLE (or something to that tune) as a valid state and not complain everytime (helping decrease the amount of logs being outputted at a higher log level).

All in though, I think this solves it, I am going to merge the PR into v1.1 and then begin the fun of merge conflict resolution between my various branches on v1.1 :upside_down_face: :joy:

miversen33 commented 1 year ago

I have merged into v1.1 and deleted the test branch. @darrott be sure to remove that branch and switch back to whatever you were using before :)

luxus commented 1 year ago

that was quick, thanks mike!

miversen33 / netman.nvim

Netman does not warn about missing docker installation #113