Better support for monorepos

bellini666 commented 3 years ago

Issues

[X] I have checked existing issues and there are no existing ones with the same request.

Feature description

Hi,

Where I work we use a lot of monorepos, and those doesn't even contain the same language (e.g. in a project the frontend directory will be a typescript project while the backend will be python)

The way null-ls works has 2 issues for monorepos like those:

1) Some linters/formatters needs to run with their cwd set to the directory where the configuration is defined. For example, inside the backend folder we have a .flake8 file which configures flake8, but if null-ls runs it from the root (the directory above, where .git is) it will not use those configs.

I could get around this by passing a custom cwd function, but I'm not sure it is the proper way (I'll explain why bellow)

2) The major issue: We need to run the specific installed linters/formatters inside those projects. For example, we have some flake8 plugins installed in the project which will not be included in the global flake8 binary. Idially the flake8 installed in the project should be used.

To workaround that I did the something like this:

null_ls.config({
    sources = {
        null_ls.builtins.diagnostics.flake8.with({
            command = function()
                if ROOT_DIRS.python ~= nil then
                    local match = vim.fn.glob(path.join(ROOT_DIRS.python, ".venv", "bin", "flake8"))
                    if match ~= "" then
                        return match
                    end
                end
                return "flake8"
            end,
            cwd = function(params)
                return ROOT_DIRS.python or params.cwd
            end,
        }),
    }
})

What is the problem there? That ROOT_DIRS.python is defined like this:

ROOT_DIRS = {
    python = nil,
    node = nil,
}
nvim_lsp.pyright.setup({
    before_init = function(_, config)
        ROOT_DIRS.python = config.root_dir
    end,
})

So, I'm depending that the python LSP runs first because it will find the properly root dir for that specific file in the buffer, so that when null_ls is going to retrieve the command, I already have the python root dir.

I cannot simply set the root dir to something else because each one will have its own root and its own formatters/linters (in my example, I want to run flake8, isort and black iside the backend dir and prettier and eslint inside the frontend dir). I did the same for each source, I'm not showing everything here for simplicity =P

In summary, this works and I got everything running. But I'm afraid of that "if null-ls asks for the cmd first?" and also it doesn't seem to be a clean and elegand solution.

Maybe there's a way to improve that kind of workflow?

Help

No

Implementation help

No response

jose-elias-alvarez commented 3 years ago

For cwd, a good solution might be to use the same function nvim-lspconfig uses to resolve root_dir:

local sources = {
    null_ls.builtins.diagnostics.flake8.with({
        cwd = function(params)
            return require("lspconfig")["pyright"].get_root_dir(params.bufname)
        end,
    }),
}

This would remove the dependency on pyright launching, and you could reuse the same function for multiple sources for the same language (or even make a factory to handle different language servers).

Prioritizing project executables is a bit tricky. You can see here the logic I used to do this for Node in a separate plugin that integrates with null-ls. The idea was to traverse the current file's parents and check if node_modules/.bin/$CMD is executable at each parent. I took a stab at making that logic a bit more generic and came up with this:

local lsputil = require("lspconfig.util")
-- for node, the prefix would be "node_modules/.bin"
local resolve_local_executable = function(cmd, prefix)
    cmd = prefix and lsputil.path.join(prefix, cmd) or cmd

    return function(params)
        local client = require("null-ls.utils").get_client()
        local cwd = client and client.root_dir or vim.fn.getcwd()

        local local_bin
        lsputil.path.traverse_parents(params.bufname, function(dir)
            local_bin = lsputil.path.join(dir, cmd)
            if vim.fn.executable(local_bin) > 0 then
                return true
            end

            local_bin = nil
            -- use cwd as a stopping point to avoid scanning the entire file system
            if dir == cwd then
                return true
            end
        end)

        return local_bin or cmd
    end
end

If that seems useful (it solved things for Node, at least) we could streamline it into a config option to avoid having to fill in the command, ideally something like this:

local sources = {
    -- no prefix
    null_ls.builtins.diagnostics.flake8.with({
        prefer_local = true,
    }),
    -- prefix? unless this is confusing
    null_ls.builtins.diagnostics.prettier.with({
        prefer_local = "node_modules/.bin",
    }),
}

I don't work with monorepos, so I'm not sure if this solves your issue. I'm open to other ideas to improve support here as long as they aren't language-specific (in which case I think they belong in a separate plugin).

bellini666 commented 3 years ago

Hey @jose-elias-alvarez (btw, by your name it looks like to me you are brazilian. If so, cheers from another brazilian myself here :) )

Thanks for the explanation. I still have to test it here, but if I understood correctly the code, params.bufname is the absolute path to the file currently opened at the buffer right? So it would transverse starting from it and upwards, until it finds the my custom local bin.

If so, then yes, I think it totally works. And the implementation with prefer_local = ".venv/flake8" would be very easy to configure and handle a lot of cases.

Also, even though the solution you proposed for the cwd works fine, could there be a way to integrate it with prefer_local. For example, if you have prefer_local = "node_modules/.bin" that means that it is very likely that you will have a .prettierc in that same directory (or in flake8's case, a .flake8 file). In that case, since null-ls already found the local bin, the path where that local bin is should be the same as the cwd that we want null-ls to run from.

So, we could think of 3 possibilities to solve the whole problem with only the prefer_local option:

1) If prefer_local is defined and it is found, it will default the cwd to the path where local_bin was found (e.g. somefolder/node_modules/.bin the cwd is somefolder, otherwhise is the default one. Obviously one can still pass a cwd function to customize it.

2) Don't force the cwd by prefer_local, but pass as a param to the custom cwd function that the local bin was found in that specific dir (which could default to root_dir if it wasn't found or be set to nil). Then it could choose to return that.

I like the first approach more, but both are suggestions :)

What do you think?

jose-elias-alvarez commented 3 years ago

Haha, not Brazilian but Venezuelan! Your understanding is correct. One small thing that I didn't mention is that when command is a function, it's only called the first time that the source runs. (This happens when a file is first opened for diagnostics sources and when a file is first formatted for formatting sources.) This means that if a project contains multiple local copies of an executable, command will stay "stuck" on the first copy it finds. Again, I don't work with monorepos, so do you think this case is common enough to be an issue?

I also like the first approach where we automatically set cwd based on the path of the local executable. From what this user mentioned on a related issue, it wouldn't handle every case, but as you said we can use it as a default only if the user doesn't manually define a cwd function.

Let me know once you're able to test out the code and verify that it solves your case, and let me know about the command issue above. It should be fairly straightforward to implement this, so I can start working on it once I have your confirmation (unless we need to make command completely dynamic, which would be tricky).

bellini666 commented 3 years ago

Hey @jose-elias-alvarez ,

Venezuelan, close enough hermano :)

Just tested and the solution works fine! I would say that, if implementing that prefer_local, it would be nice if it could receive a list of possibilities, so it would check each one, starting from the first one, up until the ls-null root and, if none are found, than the default one is returned.

Now, regarding the command issue... It is not a problem for the projects I'm working on now but I can see it being a problem in some cases. E.g. I created another directory with a python file in my repository and, if I open that file first, since it is not a descendant of the main python directory, it will stick with the default flake8 and use it for when I open other python files from that main directory.

Maybe each buffer should have its own command, just like cwd? And based on the suggestion on linking both, where cwd would use the one found by prefer_local, it actually makes sense, or else it would stick to that too.

Let me know of what you think and if you need any help with that. I'm not very proficient with lua (just ported my 10 years old vim config to lua 3 days ago, never used it before =P), but it is a very easy language to work with

jose-elias-alvarez commented 3 years ago

Thanks for the feedback! I can see that having a dynamic command could be important for some monorepo structures.

I think a good starting point would be to avoid changing the current command field and add a new field (something like dynamic_command) that runs every time the source runs. Setting prefer_local will automatically create a dynamic_command to find the nearest copy of command in the project. Also, if the user hasn't specifically defined a custom cwd, it'll also cause cwd to inherit the base path of the resolved executable.

The time it takes to run the search is small, but in the future we can improve performance by saving the last resolved command and cwd to a table indexed by buffer number. In cases where the scan finds multiple possibilities, we could then ask the user which one they'd like to use for the current buffer and save that to the table to avoid having to ask each time.

How does that sound?

bellini666 commented 3 years ago

Well, this seems fine to me! :)

I'll commend on some things that I notice, they might or might not make sense (I'm just looking through the code, I'm no expert in it yet), so take them with a grain of salt =P

1) I saw in this line that the cwd function gets called at the exact moment when the source is going to run. Wouldn't just changing the command to run there too basically fix the issue?

Yeah, I know that you probably don't want to do too much every time the source is going to run for performance reasons, which leads me to my next point:

2) I think it is safe to argue that although command and cwd can be different for each buff (the whole idea of this issue), once you find them for that buff, they should not change.

So, considering that even cwd gets called everytime I type something at the same buff (I tested here by putting a print("foobar") inside my custom cwd function), both could see a good performance improvement if they get called once for the buff.

So, if my lsp-config understand is correct, the on_attach function gets called once (and only once) everytime a new buffer gets opened so the language server can "connect" to that buffer. This seems like a good opportunity to call the cwd function and the command function and cache them for that buff in question. That way, no need to even call cwd everytime the source is going to run

For example, I do that for pyright to set my custom python interpreter when it inits:

nvim_lsp.pyright.setup({
    before_init = function(_, config)
        local p
        if vim.env.VIRTUAL_ENV then
            p = lsp_util.path.join(vim.env.VIRTUAL_ENV, "bin", "python3")
        else
            p = find_cmd("python3", ".venv/bin", config.root_dir)
        end
        config.settings.python.pythonPath = p
    end,
})

I actually use before_init instead of on_attach because I want to set the pythonPath before it starts, but the idea is the same. It still gets called once per buffer. For example, with that if I open a file in backend/foo.py and it finds python at backend/.venv/bin/python3 it will use that for that buffer. If I open another one, for example another/bar.py and it finds another venv (or even don't find one), it will use that other one and start a separated ls for it.

obs. for curiosity, that find_cmd is a slightly modified version of your function that used to there. It is defined here in my dotfiles. The one I pass to null-ls is this one, which wraps that find_cmd

So, if that all makes sense, by doing that caching in before_init or on_attach we could just define the prefer_local (as a list of possibilities would be awesome) which would get called right away to retrieve the command and a better cwd (if it find the local bin, of course)

What do you think? Like I said above, I'm no expert in this and you know your code much better than I ever will. Maybe my thoughts doesn't apply to null-ls for some reason that I can't see without further investigation.

jose-elias-alvarez commented 3 years ago

Really appreciate the thoughtful feedback. For your first point, the issue that I ran into when first working on this is that command is displayed and serialized in a few different places, so we need it to be a string once the source runs. Also, the current structure lets us validate that the resolved command is actually executable, which in my experience is a very common cause of issues. That's why I think using a different config key for dynamic commands might be the right call.

For your second point, wrapping before_init or on_attach to do buffer-specific things is a really interesting idea that I hadn't considered before, but I think that assuming that command and cwd are static for the lifetime of the buffer might be too inflexible. For example, I can imagine wanting to use a different command based on the presence of a config file or something else about the project's state that might change while the buffer is loaded. I think it's a sensible default for the prefer_local option, though, and I agree that it could improve performance, especially for diagnostic sources that are spawned frequently. We could keep the current structure and maintain a cache table in the built-in's closure to achieve a similar result.

bellini666 commented 3 years ago

I can imagine wanting to use a different command based on the presence of a config file or something else about the project's state that might change while the buffer is loaded

Is this something common? Like, even in my case, I would understand opening a file and flake8 sticking to the system's binary even after I create a virtualenv inside the main dir, and to get it to use it I would need to "reload" neovim, or at least reload the LSP. Just like neovim, after I open it, if I modify my init.lua file and add a mapping there I don't expect that it will register it unless I reload that source or close and open it again, but there might be some specific case that you have in mind that I'm not familiar with.

Having said that, either going full cache on before_init/on_attach or giving the possibility of those being changed on each run, if that means that each file can have its own command/cwd then the problem this issue is bringing to the table is totally solved!

Just to give my some more thoughts about the issue, maybe the caching thing can be decided like this (just some thoughts):

1) Someone uses the builtin directly, without providing a function to command or cwd. Then those can be cached at before_init/on_attach (cwd will basically be the found root_dir) and null-ls don't ever need to ask for them again

2) Someone uses the builtin but asks to prefer_local as a list of possibilities. At before_init/on_attach null-ls checks for those and, if any of those match (giving priority to the order of the list) then that bin is used as command and the cwd where it was found is used as the cwd. Those are cached and null-ls don't ever need to ask for them again

Note here that 1 and 2 should, AFAIK, cover most user's use cases, including all that I pointed in this issue. And they should be very performatic since null-ls should not have to ask for them ever again.

Now for the extra cases:

3) If the user provides a cwd function it will be used in preference to the root_dir cwd and also the prefer_local cwd. null-ls asks for it at before_init/in_attach and don't ever need to ask for it again.

4) Just like for cwd, providing a custom command would do the same as above for it, caching it too.

5) Now, here comes the part where everything changes: A new option, cache_options (or something like that), which could be initially set to true (I believe that most would want the cache) where, if one needs cwd or even command to be asked every time, than he could set this to false and the function passed to one or both cwd and/or command would get called at all runs.

Obviously the command one would need more time to really be implemented, like you said, but the cwd one is just flagging it as it already acts like cache_options=false when there's a function there.

That cache could also be separated into parameters for each one, or even be a table mapping each kind (e.g. cache = {command: true, cwd: false})

Again, just some ideas! Feel free to ignore them. At the end of the day, adding the prefer_local and retrieving the command and cwd (when it is not a function provided by the user) for it for each different file/buffer, than this issue is totally solved (with the small gain in performance by caching cwd at before_init/on_attach time). The rest are improvements that solves other issues and improves the overall solution, but doesn't prevent it from starting as simple as this.

Please tell me if you need any help testing or even with the development of this!

yorickpeterse commented 3 years ago

If it's of any use, I have a similar ish case (which overlaps with https://github.com/jose-elias-alvarez/null-ls.nvim/issues/270):

If a Ruby project has a Gemfile.lock, I want to run RuboCop with bundle exec but only if the Gemfile.lock includes RuboCop. If it doesn't include it, I don't want to run RuboCop at all.

I sort of managed to achieve it like this:

nls.builtins.diagnostics.rubocop.with({
  command = 'ruby-exec',
  args = vim.list_extend(
    { 'rubocop' },
    nls.builtins.diagnostics.rubocop._opts.args
  ),
  runtime_condition = function(params)
    return not rubocop_with_bundler(params.client_id)
  end,
}),

nls.builtins.diagnostics.rubocop.with({
  command = 'ruby-exec',
  args = vim.list_extend(
    { 'bundle', 'exec', 'rubocop' },
    nls.builtins.diagnostics.rubocop._opts.args
  ),
  runtime_condition = function(params)
    return rubocop_with_bundler(params.client_id)
  end,
}),

Here rubocop_with_bundler is a function that takes the LSP client ID (to get the root directory), then determines if bundle exec should be used. The ruby-exec command is just a small shell wrapper I use to ensure the correct Ruby version is used.

Since runtime_condition is called every time the linter is invoked, I do make sure the result of the function is cached. This does lead to a bit of boilerplate, but it's manageable.

Probably the most annoying part is having to list the RuboCop linter twice in the sources list, rather than once with a dynamic command and arguments.

jose-elias-alvarez commented 3 years ago

@bellini666 Thank you again for the feedback! I don't have a specific case for dynamic commands in mind, but working on this project has taught me that there is an incredible variety of use cases and project structures out there, so I prefer to keep things flexible and leave it to users / plugins to solve language-specific issues.

For now, I think it's best to keep things simple, and the simplest option I can think of is to have prefer_local always cache command and cwd per buffer on the first run. Fundamentally, this is just a wrapper for functionality that users can already implement, so I don't think we need to handle every possible config option, but I'm always open to improving the user experience and we can definitely keep working on this in the future.

@YorickPeterse Maybe I'm misunderstanding, but would setting args dynamically solve the issue?

nls.builtins.diagnostics.rubocop.with({
    command = "ruby-exec",
    args = function(params)
        return vim.list_extend(
            rubocop_with_bundler(params.client_id) and { "bundle", "exec", "rubocop" } or { "rubocop" },
            nls.builtins.diagnostics.rubocop._opts.args
        )
    end,
})

bellini666 commented 3 years ago

@jose-elias-alvarez hahaha yea, I know what you mean :)

And that option seems like the best one for me too. It solves this issue and also improves the overall performance by caching command/cwd for people that have those defined.

yorickpeterse commented 3 years ago

@jose-elias-alvarez Dynamic arguments is something I found out about after writing my comment. It indeed cleans things up. However, it only works because my command is a static wrapper. Without that, I'd need a dynamic command as well.

jose-elias-alvarez commented 3 years ago

Ah, I see now. That case should be handled by dynamic_command (or whatever we end up calling it), which will also be added in the process of handling everything else mentioned here.

mjlbach commented 2 years ago

FYI lspconfig defaults to cwd == resolved_root for servers now

jose-elias-alvarez commented 2 years ago

@bellini666 Could you (and anyone else interested in this) try #349?

bellini666 commented 2 years ago

Hey @jose-elias-alvarez ,

Just tested here and the selection of the local bin worked! But not the cwd.

To give a proper explanation, I have this structure here:

root_dir:
- .git
- backend/
- .venv
  - bin
  - flake8
  - isort
  - black
- dir1/
  - some_file.py

So, when I set prefer_local = ".venv/bin" on flake8/black/isort they were retrieved fine from the .venv dir and they were used to run the commands instead of the global ones on some_file.py. This worked perfectly!

Now, the cwd they ran from was the root_dir and not the backend dir, which was the expected one and where the config files for those bins are.

So, for me that is the only thing missing! When the prefer_local bin is found, use that cwd where it was found as the cwd to run the command. In that case, since prefer_local is set to .venv/bin the cwd would be backend/.

jose-elias-alvarez commented 2 years ago

@bellini666 Whoops, totally forgot about the cwd issue. I just pushed another commit to the PR that should take care of it, I think. Let me know.

bellini666 commented 2 years ago

Hey @jose-elias-alvarez

Now it worked! :)

I have a small corner case, which will not affect me, but want to make sure to comment here. Take this as an example:

root_dir:
- .git
- foo/
- bar.py
- backend/
- .venv
  - bin
  - flake8
  - isort
  - black
- dir1/
  - some_file.py

When I opened some_file.py now it worked perfectly! cwd as set to backend and it used flake8 from .venv/bin directory.

Then I opened at another buffer the bar.py file. It also worked perfectly as it used my global flake8 and the root_dir as cwd.

When exchanging buffers, each one uses the correct found binary (the global one for bar.py and the local one for some_file), which is correct! But after opening bar.py, the cwd changed not only for it, but also for from some_file.py, so it started to use the wrong cwd too.

Like I said, it is a very specific corner case that doesn't actually affect me, but you might want to workaround that too.

Other than that, everything working really well! :)

jose-elias-alvarez commented 2 years ago

@bellini666 Right, I see that we should also index cwd by bufnr to handle cases like this. I just pushed a commit that should help with that, too. Could you let me know if that takes care of it?

bellini666 commented 2 years ago

@jose-elias-alvarez when trying the last change I got this error (don't know how to get it as plain text):

Screenshot from 2021-11-16 15-36-36

jose-elias-alvarez commented 2 years ago

@bellini666 Sorry, there was a typo when returning the cached command. Could you try again?

bellini666 commented 2 years ago

@jose-elias-alvarez np! :)

Now it is perfect! No remaining issues.

bellini666 commented 2 years ago

Hey @jose-elias-alvarez ,

Eagerly waiting for this to be released! Anything else that you need to test that I can help with?

jose-elias-alvarez commented 2 years ago

@bellini666 Just merged it, thanks again for all the feedback and ideas! Let me know if you run into any issues.

bellini666 commented 2 years ago

@jose-elias-alvarez awesome, it is me that thank you for the attention :)

I'll surely let you know if I run into any issues

bellini666 commented 2 years ago

@jose-elias-alvarez btw, the change to prefer_local instead of having it running my custom function everytime made everything a lot faster. Like, to the point that I can notice it linting and formatting faster than before. Thank you very much for this!

jose-elias-alvarez commented 2 years ago

@bellini666 Awesome, great to hear!

nyngwang commented 2 years ago

To @jose-elias-alvarez:

Hi, I'm using PDM. It will install pyright, autopep8, flake8 locally inside the folder myproject/__pypackages__/, and I will have to prepend commands like pdm run autopep8 ...(i.e. with pdm run ...) to use these local packages. How should I set this up with null-ls? My current idea is that I will have to modify some field of the setup { } function to prepend the pdm run to the cmd(e.g. as indicated in the nvim-lspconfig), but I found it very unclear on how to do so currently :/

nyngwang commented 2 years ago

[...] and I will have to prepend commands like pdm run autopep8 ...(i.e. with pdm run ...) to use these local packages.

good news: I just resolved this with these settings, e.g. for flake8:

diagnostics.flake8.with({
  command = 'pdm',
  args = { 'run', 'flake8', '--stdin-display-name', '$FILENAME', '-' },
  timeout = 10000
}),

bad news: I have to save twice after the async linting:

I did try BufWritePre and BufWritePost, the former seems to be faster. But I still expect that the save will happen after everything is done.

null_ls.setup {
  sources = sources,
  on_attach = function (client, bufnr)
    require('lsp_config.common_setup')(client, bufnr)
    -- TODO: have to save twice for async formatting, since it's too slow.
    vim.cmd("autocmd BufWritePre <buffer> lua vim.lsp.buf.formatting()")
  end
}

jose-elias-alvarez commented 2 years ago

@nyngwang By default vim.lsp.buf.formatting() will not save the file after it runs, so what you are seeing is expected behavior, regardless of whether you're using BufWritePre or BufWritePost. You'd either have to override the handler or use something like this custom function to apply results and save.

nyngwang commented 2 years ago

@jose-elias-alvarez: I'm using tsserver and eslint(setup via nvim-lspconfig), and their on-save behaviors are very smooth(it looks like there is only one save, but I'm not sure) without additional settings. Is it possible for null-ls to be as fast like them in the future? (Or at least the linked setup could be somehow simplified.)

jose-elias-alvarez commented 2 years ago

@jose-elias-alvarez: I'm using tsserver and eslint(setup via nvim-lspconfig), and their on-save behaviors are very smooth(it looks like there is only one save, but I'm not sure) without additional settings. Is it possible for null-ls to be as fast like them in the future? (Or at least the linked setup could be somehow simplified.)

It's likely that something else in your config is affecting this, because absent other factors, the way you set up formatting in the snippet above should require saving twice if the file is actually formatted (once to trigger vim.lsp.buf.formatting() and again to write any changes made). It's not really a speed issue, and I also don't plan on significantly deviating from the LSP formatting API.

Anyways, this is getting away from the original issue, so if we want to continue this discussion I think we should move it elsewhere.

jose-elias-alvarez / null-ls.nvim