Implementing filtering on output

shoffmeister commented 1 year ago

I am trying to use gfold to answer a question for a (very large) tree of git repositories: "Is there anything that has not been pushed to the authoritative origin?"

The default output of gfold allows me to determine an answer ("unclean", "unpushed") - but, alas, the default output is very large.

I am unable to locate a built-in means to filter gfold output such that only those repositories are listed where the status is not "clean". I do see an option to emit JSON and then filter on that using, say,

gfold --display-mode json | jq ' .[] | select( .status | contains("Clean")==false ) '

but the output from that is JSON and ... I am still human and prefer nice(r) output.

Is there any means to accomplish natively within gfold what the above tries to do?

nickgerace commented 1 year ago

This is a great idea. To answer your question...

Is there any means to accomplish natively within gfold what the above tries to do?

No, that being said, that's partially why I added the JSON feature: to give folks a workaround in the event that I didn't have time to implement a display-related feature right away.

I think an ability to filter would be great! Do you have a preference on how the flag would/could work?

shoffmeister commented 1 year ago

My use case is quite straight-forward: I want to find all those repos where I "messed up", where there is a whiff of unfinished business in that something is "here" that should probably be remote (too).

I believe that, generally, this is what gfold will locate for me.

At first look, and purely based on instinct, this would suggest a flag "--ignore-clean" or "--skip-clean", possibly --ignore-clean-repos ?

Similarly, I might also be interested in local repositories which contain stashes (which, I believe, is not covered by gfold); there should be breathing room in the naming to cover for such a conceivable future feature - say --ignore-stash or --skip-stash

So a nicely named "--ignore" flag to remove data from the results could perhaps work best ? I'd agree that for more complex use cases filtering through jq would be the way to go.

I am not sure whether it should be more complex than that?

nickgerace commented 1 year ago

Thanks for the detailed explanation. The context for why you want it is helpful too.

I could see individual flags working, but I'm curious if we could get something ergonomic for a variety of inputs too.

Would something like this work for you too?

-f/--filter    <comma-seperated-list-of-statuses>    Possible values: unpushed, unclean, clean, bare, etc.

So then you could run the following to see only unclean and unpushed results...

gfold --filter "unclean,unpushed"

Thoughts?

P.S. the stashes idea is also a great one. It can likely work with whatever ignore/filter feature is added, though we'll need to investigate working with stashes in general independently of this issue. Filed #241 for it.

shoffmeister commented 1 year ago

Many thanks for making the stashes idea trackable!

I see where you are coming from, with having a generic --filter on the (specific known-in-advance) set of Status enum members. This will work from a technical implementation point of view.

I'd like to approach these decisions from a - my - usability point of view, too:

I don't want to remember, I want to be offered
I don't want to type, I want the machine to read my mind

With that, I can see adding clap_complete support, then injecting source <(gfold --complete), then as a user mindlessly gfold --<TAB> to minimize cognitive load, allowing me to be lazy. I am not sure whether the generic filtering approach is compatible with that?

What I am looking for is a way to convert the local state of working copies (AKA "my mess") into something actionable - be it (re-)reviewing, discarding, upstreaming, forking, ... Here, I am not all that interested in the specific status, more in the resulting action, e.g.

"Unclean" implies "discard or commit"
"Unpushed" implies "push".
"Bare" means does not really mean anything to me - different tooling would fill that from remote
...
"Stashes Present" implies ... more troublesome? ... than "Unclean"? But still with the same "discard or commit" implications?

The ultimate goal would always be to not have things linger locally. It must go somewhere, eventually, either remote or into the dustbin.

So, basically, I see two outcome-based actions,

"fix-it"
"push-it"

that I'd want to filter on, but I don't know how well my actions translate generally into everybody's individual workflow?

An aside

Of course this is all within the constraints currently imposed by

pub struct RepositoryView {
    /// The directory name of the Git repository.
    pub name: String,
    /// The name of the current, open branch.
    pub branch: String,
    /// The [`Status`] of the working tree.
    pub status: Status,

    /// The parent directory of the `path` field. The value will be `None` if a parent is not found.
    pub parent: Option<String>,
    /// The remote origin URL. The value will be `None` if the URL cannot be found.
    pub url: Option<String>,

    pub email: Option<String>,
    pub submodules: Vec<SubmoduleView>,
}

In any future iteration of gfold the current flattening into this structure could be worth-while reviewing, e.g.

allow for multiple remotes (github forking sets up origin and upstream by default)
remove the parent (which can be immediately derived from name AFAICS?)
have a status also per local branch - things may be off from fully fetched remotes, I may have forgotten to push some branch

This would also still be within local discovery, with no expectation for gfold to ever work with remotes (see for instance ghorg for a specialized tool to do aggressive fetching)

With such an extension into a vNext, gfold would be transformed into more of a select * from all_reachable_git_repositories, where the * is what RepositoryView would be mutating into. That then would also require the default UI to make some representation decisions, but the real value could be in the JSON'ified RepositoryView output.

Having filters built into gfold on that JSON'ified output would be too much, IMHO; this is where I'd expect users to switch to jq (or similar) as a data processor.

So, in such a future end, gfold would serve two very similar purposes:

immediate user-actionable user output with very limited filtering capabilities focused on 80/20 workflow
data pipeline source, generating machine-readable output for convenient processing

nickgerace commented 1 year ago

@shoffmeister no problem on making the stashes idea trackable.

I don't want to remember, I want to be offered I don't want to type, I want the machine to read my mind

Those are really helpful for helping me get a grasp on what is desired here.

I've never written tab completion logic, to be honest. However, I'd be open to investigate! Sounds like a fun dive. It might not be part of solving this specific issue though.

As far as the "stashes present" and "bare" statuses go, the former is something that would be considered as part of the other issue and the latter is purely a reflection of a state that libgit2 returns. I can convert it into something else that's more useful to folks though.

Totally agree that using gfold to figure out "what do I need to do?" is a great choice. It's not the only reason to use gfold, but it's likely the primary one.

This would also still be within local discovery

That's 100% the mentality of gfold: it's purposefully "local and fast" and keeps a narrow focus on purpose. Yes, RepositoryView is a bit "happy path-y" at the moment... it's not very flexible. That's because I've historically been concerned with benchmarking and speed. Therefore, I collect as little as possible, even if the domain provides more. That being said, since we already have local file handles into the repository, it's probably very inexpensive to expand that struct to collect multiple remotes, handle technical debt like the parent field, etc.

In fact, that's partially why I split libgfold into its own crate recently. If libgfold can perform inexpensive lookups and we benchmark it independently of the CLI, I'm more comfortable expanding the public structs and making them less opinionated about the environment (e.g. the url always looking something like remote origin <main-branch> by default).

I have more thoughts here, and have thought on it on and off when I've had free time, but that's some immediate thoughts that come to mind. I think you're on the money overall and I'll take them into consideration.

uncenter commented 3 days ago

I'd like to take a stab at implementing this. I dislike --filter purely because I think "filter" as a word is confusing - filter out (exclude), or filter in (include)? I'd prefer two, mutually exclusive options; --include and --exclude, or --show and --hide, or similar. If I want to list everything but clean repositories, I'd use gfold --hide clean. If I only wanted to see unpushed repositories, gfold --show unpushed. If I wanted to see unpushed and unclean, gfold --show unpushed unclean, etc.

nickgerace / gfold

Implementing filtering on output #238

An aside