richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
214 stars 30 forks source link

Add XDG support for siegfried home directory #221

Closed prettybits closed 1 year ago

prettybits commented 1 year ago

This adds support for the XDG base directory specification to put the siegfried home directory in the data directory by default, which I believe is the right choice for the kind of data put there. I was thinking whether it could even be classified as a cache directory but I don't think that would fit.

As the golang team has decided to only support cache, config and home user directory lookup in the os module (see here) I added github.com/adrg/xdg as a new dependency for mapping the right data directory per platform.

The new default siegfried home directory would now be located in:

Linux: ~/$XDG_DATA_HOME/siegfried (or ~/.local/share/siegfried) macOS: ~/Library/Application Support/siegfried Windows: %LOCALAPPDATA%

For backwards compatibility a check for an already existing siegfried home directory is retained to continue using it if present.

Also, a SIEGFRIED_HOME environment variable will now be used to override the default location. The command line option still has the highest priority.

Closes #216

PS: While at it, I decided to also remove usage of the deprecated ioutils package and moved all checking against error type to the newer errors.Is method to be consistent in the codebase. I hope that's alright with you, otherwise I can also separate that part out of this PR again.

richardlehane commented 1 year ago

thanks Bernhard! This looks nice and a lot of helpful info. I try to avoid non-std lib dependencies as far as possible. The golang issue link you includes this bit:

"The only disadvantage I see with adding this API is that one could say we'd be encouraging users to store data files into XDG_CONFIG_HOME instead of XDG_DATA_HOME on XDG systems. However, those few programs that wish to differentiate their data directory from their config directory could simply do:"

func userDataDir() (string, error)
    if dir := os.Getenv("XDG_DATA_HOME"); dir != "" {
        return dir, nil
    }
    // fall back to something sane on all platforms
    return os.UserConfigDir()
}

would that be an option to default to data directory but avoid the additional dependencies?

prettybits commented 1 year ago

Hm, I could try to add a local implementation specifically for getting the data directory instead. The advantage of a common library is that it already includes the respective mappings for special folders on common operating systems, even if the go library is called xdg which makes this look more specific than e.g. the equivalent directories Rust crate. In your original issue you mentioned that you'd like to support AppData on Windows too, so I thought having that covered that way would be good.

The suggested snippet would only cover Linux and similar userspaces and only checks for the environment variable without a fallback to the default XDG data home location if not set. If I were to make this more flexible in a local implementation I'd have to look into how to branch the behaviour across operating systems. The xdg package does this via conditional compilation (if that's the term used in go?) as far as I can see.

richardlehane commented 1 year ago

yes, conditional compilation sounds like the right approach: default_darwin.go, default_windows.go and default_unix.go + a default.go (maybe using the snippet above) as fallback for plan 9 users and other non-conformists

prettybits commented 1 year ago

I'll see what I can do, but I'm still wondering about what your concerns are with including non-stdlib packages or this one in particular? I also generally try to keep external libraries in check, but integration libraries like this are one of the things I personally strongly prefer to have shared and reused. As I said, just interested in your reasons.

richardlehane commented 1 year ago

Fair point! It's a general aversion ... and possibly unhealthy. Having had no/few external dependencies (depending how you classify golang.org/x libs) has helped keep the maintenance burden low over the last years and I'd like to keep that going unless there is a compelling reason not to.

adrg/xdg looks like a good library but we only need the data directory from it, so should be able to reproduce that functionality with a limited amount of code? Cherry picking also reduces the amount of stuff happening at startup (those init functions in adrg/xdg do a bit of work we can avoid checking env vars, existence directories, populating slices etc.)

prettybits commented 1 year ago

Please have a look at my latest commit, I tried to implement the minimum needed for finding a fitting user data directory without loss of potential destinations from the proposed library. I'm especially unsure about the build tags as so far I didn't have much of any practical exposure to Go and I'm not sure how they combine with the tags you already had there.

prettybits commented 1 year ago

RE: testing, this does seem persnickety enough a PR to maybe warrant unit tests? (An overhead of locally maintained packages.) but maybe not. If you are interested in testing you can adopt the same pattern, default_{distro}_test.go and then run it using go test -tags windows ./... (or similar, e.g. excluding the glob pattern).

I'll have to look into testing on Go soon, but won't get to it right away, so I'll wait for further feedback whether you think it is necessary (and reliable?) if that's alright and will leave the CI question to Richard. :)

richardlehane commented 1 year ago

Interesting addition to the code. I have been following along and looking forward to it. I added a comment on Windows below. Also I am not sure why the GitHub actions aren't kicking in for this PR, is it because it is from another remote?

the GitHub action didn't fire because it was waiting approval (I guess because first PR from a new committer?)