Use XDG Base Directories for storing data on Freedesktop-compatible systems (e.g. Linux)

jmacmahon commented 4 years ago

User data is currently stored in $HOME/.config/coc. This is non-standard; $XDG_DATA_HOME should be used instead.

Is your feature request related to a problem? Please describe.

The GNOME project has a good list of the problems caused by not using the standard XDG base directories.

Describe the solution you'd like

User data stored in $XDG_DATA_HOME on supporting systems.

Describe alternatives you've considered

User data could be stored somewhere under .vim. This seems like a less standards-compliant approach and has a potential to be conflated with manual configuration files.

Additional context

Reference specification Summary of XDG base directories on the Arch Linux Wiki GNOME project's usage guide

fannheyward commented 4 years ago

Currently, we use $XDG_CONFIG_HOME for config home.

samhh commented 4 years ago

@fannheyward There's a lot of stuff in there that belongs in XDG_DATA_HOME as the original poster said. A concrete example is history.json - this is not in any sense configuration.

jmacmahon commented 4 years ago

Just noticed that my patch for this was nonsense, so fixed the commit.

@fannheyward re: your point, this is about user data, not config. There is a separation of these concepts within coc.nvim -- the functions coc#util#get_data_home vs. coc#util#get_config_home are separate.

As the code stands, we use $XDG_CONFIG_HOME for user data as well as config, which is non-standard. This patch fixes this by using $XDG_DATA_HOME, which is standard.

Please re-open.

chemzqm commented 4 years ago

It is non-standard on your system, but this change would make many current user would experience extensions not work at all. We choose XDG_CONFIG_HOME since the files are not only data, but also extensions. You can use let g:coc_data_home = $XDG_DATA_HOME."/coc" in your vimrc.

jmacmahon commented 4 years ago

Extensions managed by coc.nvim are user data. If an application is storing an entire checkout of a git repository, it cannot be considered simply config. This is user data.

This is not just non-standard on my system, but rather on all Freedesktop-compatible systems.

On Thu, 20 Aug 2020 at 12:14, Qiming zhao notifications@github.com wrote:

Closed #2245 https://github.com/neoclide/coc.nvim/issues/2245.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neoclide/coc.nvim/issues/2245#event-3674953825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJYYLHDZVK7WURTDJ45O4DSBUHRVANCNFSM4P77T26A .

oblitum commented 4 years ago

IMO, the request is understandable, but seems a rather breaking/problematic change for all current coc.nvim users, that matter more than sticking to some not-so-relevant directory pattern (few users are currently bothered about that). So maybe this is worth being considered for a breaking-change release, in some remote future?

chris-morgan commented 10 months ago

Actually, installed extensions are neither config nor data: they’re something else again that clearly belongs in yet another location: ~/.local/lib^🏹.

First of all, there are four relevant locations (reference: file-hierarchy(7)):

Thing	System location	User location	XDG environment variable	Loose Windows equivalent
Data	/usr/share	~/.local/share	XDG_DATA_HOME	AppData\Roaming
State	/var/lib	~/.local/state	XDG_STATE_HOME	AppData\Local
Config	/etc	~/.config	XDG_CONFIG_HOME	AppData\Roaming
Code/libraries	/usr/lib^🏹	~/.local/lib^🏹	None defined	AppData\Local

(Honestly I’m not sold on the /usr/share ∶ ~/.local/share analogy—in common practice, /usr/share is exclusively read-only and controlled by the package manager, though occasionally /usr/local/share may be granted a little leeway; whereas ~/.local/share is mostly used for mutable application data. In real-world usage, ~/.local/share is treated more like /var/lib, even though the XDG Base Directory Specification calls STATE a place for stuff “not important enough or not portable enough” for DATA.)

If you installed coc extensions through your system’s package manager, I think all reasonable people would agree that within /usr/lib^🏹 would be the only correct location. I contend that when as a user you install an extension with :CocInstall, it should go into the corresponding per-user directory, which is ~/.local/lib^🏹; and that this is clearly the most correct answer.

Although not much software uses this location (very little software needs it, and most of what does predates it or doesn’t care about XDG conventions), Python uses it, and pip install --user will install to site.USER_SITE, which defaults on these platforms to ~/.local/lib/pythonX.Y/site-packages.

The package.json which records which extensions are installed could reasonably be considered config (or even data, but I think config fits better); perhaps the nicest technique is to store it in config and copy it across to lib, and detect on startup if the two are out of sync and suggest an automatic reconciliation (copy from config to lib and install/uninstall/whatever).

🏹 Architecture-independence

Now to explain the 🏹 I’ve been using, which represents archery. This is the bit that makes it so clear why code belongs in a different place from data, even if you end up deciding you don’t care to support a currently-esoteric use case.

I’ve talked of /usr/lib and ~/.local/lib, but actually they’re both split into architecture-neutral and architecture-dependant stuff:

$ systemd-path system-library-private
/usr/lib
$ systemd-path system-library-arch
/usr/lib
$ systemd-path user-library-private
/home/chris/.local/lib
$ systemd-path user-library-arch
/home/chris/.local/lib/x86_64-linux-gnu

(Though as you see, the architecture-specific system library directory is normally the same as the architecture-neutral one, for convenience, historical and compatibility reasons.)

Native code is the reason for this split, because it’s architecture-specific.

For coc.nvim: extensions (or their transitive dependencies) might contain or compile native code, and I don’t think there’s any way of detecting that statically (otherwise, ones not containing native code can go in user-library-private). So the most correct way of doing it is to use the user-library-arch directory. That way a home directory that is shared between machines of different architectures will still work: each will have its own set of installed extensions with any native code compiled for its own architecture. For the usual case of a single-machine home directory, the only cost is that everything’s one level deeper, but that’s not a problem, users aren’t expected to interact with these directories at all.

True shared home directories aren’t common these days, but partial or even near-complete synchronisation is fairly common (“dotfiles” being the most obvious example in tech circles). Also mixing architectures has been rare for quite a long time, but is becoming less rare as ARM gets used for more “serious devices”, while x86_64 is still the dominant ISA for work.

(I note, however, that Python ignores the arch-id bit in its ~/.local/lib usage, making it not portable in the presence of native code. It does actually have a way of signalling architecture-dependence, so it’d be better if it installed -neutral ones to ~/.local/lib and -dependent ones to ~/.local/lib/arch-id. But it doesn’t.)

My recommendations

Firstly, please reopen this issue. The problem still exists, and you haven’t decided not to do it (just “maybe later”), and in fact there are perfectly reasonable ways of making it happen in a non-breaking way, so it shouldn’t be closed. Closing issues inappropriately makes life difficult in issue trackers. It’s an issue tracker, not a work tracker.

Then, the rest should ideally be shipped all at once, though some parts can be split up, and there are multiple ways of doing things.

When it comes to moving things, I see three main approaches:

Support the old and new paths. This was my first inclination in many cases, but it tends to become messy.

Move things from old to new path on a case by case basis, on use. This can be done with functions like this:

" Get the path of a data file that used to live in config. Moves it first if necessary.
function get_formerly_config_data_path(name)
  let old_path = get_config_home() . "/" . name
  let path = get_data_home() . "/" . name
  if path->glob()->empty() && !old_path->glob()->empty()
    " Old path exists, new path doesn’t: try to rename it.
    if old_path->rename(path) != 0
      " Failed, keep using the hold path.
      return old_path
    endif
  endif
  return path
endfunction

Frontload a migration on startup. Something like, define a new coc setting coc.fsSchemaVersion = 2, and shuffle everything around with a simple migration script if the setting isn’t found, and maybe baulk if it’s set to some value other than 2. If migration fails, tell the user what they might need to do manually. I suggested a coc setting; not sure if that’s what’s best or not. If coc.nvim always created some file in a location that was going to change, you could use that as the trigger, but I don’t think it’s so. I also note there’s a small compatibility hazard with this approach, if you have a partially-synchronised home directory. But all up, this is my recommendation now, after several iterations of contemplation and leaving it for a week or two.

Define a new location called lib:

On freedesktop.org platforms, its default value should be the result of something like system("2>/dev/null systemd-path user-library-arch || echo ~/.local/lib")->trim() (Vimscript, hopefully a better way exists!).
On other platforms, it should default to the data path unless investigation suggests there’s something more suitable. (Aside: eww, NeoVim uses local appdata for its config under Windows (see also https://github.com/neovim/neovim/issues/24009)? Ick. Config should definitely be roaming appdata. Data is more subjective, but I suspect roaming is more appropriate. Now lib, that’s local appdata, for basically the same reasons as the arch bit in lib under XDG: local appdata is for things specific to the current machine that shouldn’t be synchronised between machines, whereas roaming is designed to be. Windows tends to support shared home directories a bit better than Linux. As for macOS, I have no idea.)

A suitable migration trigger (if not frontloading the lot) would be lib/extensions not existing but config/extensions existing. Migration actions required are to move config/extensions to lib/extensions and copy its package.json to config/extensions.json.

Subsequently, when installing something, add it to both config/extensions.json and lib/extensions/package.json, which two files should be identical.

On startup, compare config/extensions.json with lib/extensions/package.json, and if they differ, prompt the user to reconcile them by applying what’s in config to lib.

Define a new location called state. It should default to $XDG_STATE_HOME or ~/.local/state on freedesktop.org platforms, or to the data path on other platforms unless there’s something more suitable. (My recommendation for Windows would be that data should be moved to roaming appdata, and state local appdata. End result is that on Linux, state will be moved, but on Windows, data will be moved!)

Consider the location of every file and folder, whether it needs to be changed, and add migration steps as necessary. Things like history should move to state. I haven’t thought much further.

How does this sound as a general concept? I might be willing to aid in the implementation, though it’d probably need some assistance for macOS and Windows and NeoVim.

fannheyward commented 10 months ago

@chris-morgan very clear explain, thank you!

First of all, let's ignore System locations, coc.nvim's extensions should not been installed by system package manager, although you can install with npm like npm install -g coc-json. extension maybe provide data downloaded and used by itself, if we put it in system locations, this bring another issue: users, include we, don't like coc.nvim and extensions to add something unexpected to system library/folders.

I've never heard of State location, looks like it's similar as Data location for not important enough data. I'll treat it same as Data.

For the user-library-arch location, the main Pros I think is each will have its own set of installed extensions with any native code compiled for its own architecture. But:

not all OS provides this, Windows and macOS, no systemd-path, no XDG env
data downloaded and used by extensions mostly already addressed the architectures issue, only arch consistent binary will be downloaded to use
let's ignore arch-id and fallback to ~/.local/lib

Now ~/.local/lib and ~/.local/share, I prefer to ~/.local/share, the XDG_DATA_HOME env exists for standard.

For Data and Config locations, I agree extensions and memos.json etc are user data, should go to XDG_DATA_HOME. A migration is needed to avoid the breaking change.

let g:coc_data_home = $XDG_DATA_HOME."/coc" is a solution, not perfect through.

neoclide / coc.nvim

Use XDG Base Directories for storing data on Freedesktop-compatible systems (e.g. Linux) #2245

🏹 Architecture-independence

My recommendations