Auto-add of untracked files screws me up every time

durin42 commented 2 years ago

Description

Initially I thought the auto-add of files was a neat idea, but in practice I just leave untracked files in my repo all the time, and tools like patch(1) assume they can drop .orig or similar files in the WC without it being a problem. I think every time I've used jj I've ended up getting grumpy at auto-adds and having to rip something out of a commit, sometimes after doing a push (when it was effectively emulating hg import for example).

Steps to Reproduce the Problem

patch -p1 < some.diff (or similar)
jj describe && jj close && jj git push
Look at web view of diff, notice you committed a .orig file again (or similar)

Expected Behavior

I (still) don't expect auto-adds, and it really surprises me every time, plus it's super frustrating to have to ignore every file spec I might create as a temporary scratch file or what have you.

Actual Behavior

As above.

Specifications

Platform: macOS
Version: 6c83eb6ae3c56f1912a630dca4751c7f723a81a3

martinvonz commented 1 month ago

I wonder if a good design around this whole problem would be to add a jj track which would be the inverse of jj untrack and which would force JJ to start tracking an otherwise-ignored file.

We have such a command planned. It's just that no one has cared enough to implement it yet :)

necauqua commented 1 month ago

While I finally learned to ignore the nagging feeling in my head about amending very temporary unfinished code just to run jj diff or something and can finally apreciate the "everything is just stashed and you can just jump to other commits at any moment" jj way - the auto-add is still pretty annoying, every time I create a new repo I have to carefully get the .gitignore right, every time I have random garbage in the repo folder I have to take care to not run any jj commands - and I did still unknowingly commit random test/log files - at least with jj fixing the commits was trivial, but those files are now forever in the oplog

Imo just the state file-is-tracked/file-is-untracked, without staging, is not something I'd call "the second hell of git", I think it's pretty simple, and if jj commands will list untracked-but-not-ignored files until you track/ignore them it'd be extremely useful.

And again, the difference between what jj does currently and that is as simple as disabling some codepath with a config switch, (very, actually) similar to git.auto-local-branch - if you like auto-add you can have it, and even keep it as the default behavior, or you can disable it if it screws you up every time while keeping literally every jj benefit (incl. auto-track of file modifications/deletions) anyway.

PhilipMetzger commented 1 month ago

While I finally learned to ignore the nagging feeling in my head about amending very temporary unfinished code just to run jj diff or something and can finally apreciate the "everything is just stashed and you can just jump to other commits at any moment" jj way - the auto-add is still pretty annoying, every time I create a new repo I have to carefully get the .gitignore right, every time I have random garbage in the repo folder I have to take care to not run any jj commands - and I did still unknowingly commit random test/log files - at least with jj fixing the commits was trivial, but those files are now forever in the oplog

[...]

And again, the difference between what jj does currently and that is as simple as disabling some codepath with a config switch, (very, actually) similar to git.auto-local-branch - if you like auto-add you can have it, and even keep it as the default behavior, or you can disable it if it screws you up every time while keeping literally every jj benefit (incl. auto-track of file modifications/deletions) anyway.

I think the major disagreement here is that I see auto-add as a core feature of Jujutsu which will allow neater features in the future with a native backend. While many people want to keep the model of the old world partially due to a Nix assumption or because it was "OK" until now. It also has been clear for while now, that you're allowed to implement it (https://github.com/martinvonz/jj/issues/323#issuecomment-2065499670).

I just don't want to make guarantees that the feature will be around for the eventual 1.0, as the current model is a vast simplification of the Git/$CURRENT_VCS model (just like the Git backend).

There are also a bunch of comments in this thread which show appreciation for the simple model of auto-adding files (https://github.com/martinvonz/jj/issues/323#issuecomment-2216283708, https://github.com/martinvonz/jj/issues/323#issuecomment-1537622196, https://github.com/martinvonz/jj/issues/323#issuecomment-2215225828, https://github.com/martinvonz/jj/issues/323#issuecomment-1445553303), you can clearly count me in too. So we can't just change the model to it. And as Emily put it in https://github.com/martinvonz/jj/issues/323#issuecomment-2215024425, way more eloquently than I could, it will need a major rethinking of our current tooling and a adjustment period for users, as what was previously implicitly supported (the worst kind of support) is no more. See also @durin42 answer upthread for someone who accepted the change.

And your pain with the initial setup can surely improved with #2747 along with #3597. It just needs some more inputs, ideas and features.

marc-h38 commented 1 month ago

I could be wrong but I'm afraid both @emilazy , you and maybe a couple others keep confusing "new files" with "temporary files with random names". Probably because you never had to deal with the latter? Also because there is no way for any version control system to tell the difference: they must be told with some track command. Yet these are totally, utterly different types of files.

joyously commented 1 month ago

Wouldn't it be easier to keep the repo folder clean, and do build, test, develop in a separate folder?

marc-h38 commented 1 month ago

... "temporary files with random names". Probably because you never had to deal with the latter?

Wouldn't it be easier to keep the repo folder clean, and do build, test, develop in a separate folder?

Thanks for proving my point. Of course it's easy to add the build folder to .gitignore; it's not about that. Scroll up for examples.

joyously commented 1 month ago

Of course it's easy to add the build folder to .gitignore; it's not about that.

I didn't suggest that. I'm talking about a development process in which you also test your installation process by starting with an empty folder. This ensures that everything that is needed to build is actually in the repo. Think of the repo as what is shipped, and do all the dirty work elsewhere. Your VCS should handle versions of the real thing, not wade through all the mess.

ppwwyyxx commented 1 month ago

I think we can all see there are some benefits of auto-add. But to a large group of people, the benefits are not worth the pains -- let's all try to acknowledge this first, even if you don't belong to this group.

Things like convention, habbits and backward compatibility are all very hard to change, so it seems obvious to me that without addressing the pains, there will always be a significant number of people in this group. As shown by comments above, this is blocking people (me included) from seriously using jj.

There is nothing wrong to keep explaining how good the benefits are. But I don't think it helps much. It almost feels off-topic to me, because the benefits don't make the pains disappear.

martinvonz commented 1 month ago

I agree. I think it's basically just that the people who are happy with auto-add are less likely to implement support for manual tracking. As @PhilipMetzger noted, I have said before that I haven't thought of a reason we couldn't support manual tracking. We just need someone to step up and implement that feature, and that will most likely have to be someone who wants the feature.

If I had unlimited time, I would not mind implementing it for the benefit of those who want it. I have very limited time lately, however.

PhilipMetzger commented 1 month ago

I could be wrong but I'm afraid both @emilazy , you and maybe a couple others keep confusing "new files" with "temporary files with random names". Probably because you never had to deal with the latter? Also because there is no way for any version control system to tell the difference: they must be told with some track command.

As an aside could you please not be so dismissive of people and their experience? As you write and know, there's no technical difference between "new files" and "temporary files with random names" for a VCS, so why should they be treated specially except for user convenience? It makes no sense.

Since I'm of the opinion that littering the working-copy was a bad habit anyway after adopting jj a while ago, I won't continue the meaningless discussion on it, as it is unlikely that either of us is able to see/or accept the others points.

as @ppwwyyxx puts it quite nicely:

I think we can all see there are some benefits of auto-add. But to a large group of people, the benefits are not worth the pains -- let's all try to acknowledge this first, even if you don't belong to this group.

Things like convention, habbits and backward compatibility are all very hard to change, so it seems obvious to me that without addressing the pains, there will always be a significant number of people in this group.

To let the world adopt Jujutsu we will need to make some adjustments to make it like Git, the currently dominant VCS, but that should never mean that we should give up on Jujutsu's identity which I'm arguing auto-track is a part of. That does mean that the world also should be ready to adjust to its model when the time comes.

necauqua commented 1 month ago

Whoah, seems like I re-sparked this a little bit.

I'll repeat the core part of my point, which is that I think that auto-track of new files and auto-track of file modifications are slightly different - different parts of what jj currently does, and I argue that the most benefit (not all of it, ok) we get from the latter.

Having no auto-track of new files is both pretty simple, it's not reintroducing the whole staging mess or whatever, nor do I think it significantly alters the core jj model mentioned - I honestly hardly get why @PhilipMetzger gets so worked up about it - and again when disagreements like that arise I always say (as you've noticed) we just make a config flag for both the cake and eating it.

I don't agree that this has anything to do with "having lingering git concepts" or adoption or Nix or whatever, I like the jj way. It is largely convenient - people do different things and purists yelling at me to not put random test files or logs or what have you in repo root and have a whole different dir to cd in-out of (or at the very least an ignored test subdir in every repo) won't change my habit of doing so - and I suspect a lot of people do that.

I have been thinking about taking a stab of implementing this myself ever since Martin mentioned that - maybe I will at some point, as of course I'm interested and believe this will make jj that much better for myself.

necauqua commented 1 month ago

Also, fixing forgetting to track some file feels better for me conceptually (just amend it, duh, the whole jj way) than fixing forgetting to not track some garbage, especially 20gb of target/, or something that must not be in the jj/git store, like unecrypted secrets.

emilazy commented 1 month ago

What‘s the current consensus for what the implementation would actually consist of? Just adding jj track so that people can put * in .git/info/exclude and be done with it, or something more elaborate?

PhilipMetzger commented 1 month ago

What‘s the current consensus for what the implementation would actually consist of? Just adding jj track so that people can put * in .git/info/exclude and be done with it, or something more elaborate?

Basically jj track and the flag @necauqua mentioned, which should be opt-in by default and advertised as that in the documentation.

necauqua commented 1 month ago

Yeah, off the top of my head the code that does the tracking would consider files from @ instead of the workdir and jj track would manually add new files to @, on paper it looks pretty simple

And the config flag to enable that and the docs, of course, although personally I'd avoid the language of "we have this as a transition measure for people coming from git and in gitless jj world you'd have it this one way only and you'll like it"

martinvonz commented 1 month ago

Yeah, off the top of my head the code that does the tracking would consider files from @ instead of the workdir

I don't follow this part.

and jj track would manually add new files to @, on paper it looks pretty simple

Yes.

We want jj track even for the auto-tracking folks so you can start track a path that's normally ignored. We already have jj untrack for untracking files. It warns if your files will be added back due to auto-tracking. If we added a config, then that would not trigger the warning if you had turned off auto-tracking.

Implementation-wise, I think it means we need a new method on the LockedWorkingCopy trait. Untracking works by removing the specified files from the commit and then uses reset() to update to the new commit without touching files in the working copy. We kind of could do that for jj track too by adding placeholders (presumably empty files) in the commit. That would make the next snapshot include the actual file content. However, that's kind of ugly, and it won't work with patterns either. So I think we do need a LockedWorkingCopy::track(&dyn Matcher) or something like that.

I think it may have come up somewhere above already, but note that having untracked files means we can't just update to a commit where those files exist and overwrite the files on disk. However, we already handle that by not overwriting existing files. We print a message about "skipped updates" when that happens. That means the existing untracked file will replace the commit's files on the next snapshot. I think our handling of that is already pretty good. I'm mostly mentioning it because it would otherwise have been a really bad drawback of not auto-tracking. It would presumably mean that either your untracked files would get lost (overwritten by those from the target commit), or that the working copy would be stale.

dpc commented 1 month ago

Since I'm of the opinion that littering the working-copy was a bad habit anyway after adopting jj a while ago,

I'm just not going to use a tool that doesn't accommodate for ways I want to work. If a tool has a problem with a random temporary file I made, too bad. Bye.

emilazy commented 1 month ago

I understand that people feel strongly about this matter, but I think it would be a lot easier to get Jujutsu to work well for everyone if the temperature of the comments on this issue was lower. It’s been stated multiple times that Jujutsu is open to accepting this feature and that it just needs someone to work on it; I don’t think a hostile tone helps get this functionality any closer to existing.

marc-h38 commented 1 month ago

This ensures that everything that is needed to build is actually in the repo.

Again, this does not always work; not for every project or usage. Scroll up for examples.

Think of the repo as what is shipped, and do all the dirty work elsewhere.

Think of the repo checkout as a workspace / workbench / kitchen counter. That's how many projects and people have been using it since computers were invented. Every version control system ever invented has accommodated that usage model so far - absolutely nothing specific to git or the index.

marc-h38 commented 1 month ago

you and maybe a couple others keep confusing "new files" with "temporary files with random names". Probably because you never had to deal with the latter?

As an aside could you please not be so dismissive of people and their experience?

I'm sorry if sharing my different version control usage and experience made you feel like yours was dismissed and threatened somehow and I really don't understand how that happened. I don't see how highlighting the difference between the two usage models could have felt "dismissive".

There are two ways to "experience" a repo checkout: 1. either you treat it as a "dirty workbench" and allow temporary files with random names, or 2. you don't. The current debate is precisely because jj supports only the "clean" usage 2. (for now) and because some people are being dismissive of the "dirty" usage 1 as in: "Just stop doing what you've always been doing". No one has even remotely suggested to dismiss "clean" usage 2. AFAICT, everyone understands very well that it is jj's main usage model and that it always will be (the risk of "fragmentation" is real but that's a different question).

As you write and know, there's no technical difference between "new files" and "temporary files with random names" for a VCS, so why should they be treated specially except for user convenience? It makes no sense.

Because there is a conceptual difference that no tool can automatically figure out; so they must be told. Always have been. Tools are meant to serve users - or at least a particular subset of them.

martinvonz commented 1 month ago

Because there is a conceptual difference that no tool can automatically figure out; so they must be told.

That's true for jj too, but you currently have to use .gitignore to tell it. Again, I'm all for someone adding manual tracking to jj. I just don't think "must be told" necessarily means an explicit command (which seems to be what you're suggesting) is needed (there are people using jj already, which suggests that it's not, right?).

joyously commented 1 month ago

That means the existing untracked file will replace the commit's files on the next snapshot.

Is that still good with undo?

martinvonz commented 1 month ago

That means the existing untracked file will replace the commit's files on the next snapshot.

Is that still good with undo?

Depends on what you consider good :) Since snapshotting is its own operation, undoing right after such an operation will just undo the snapshotting. And if you restore from an earlier operation, you're still not going to get back the untracked file as an untracked file in your working copy.

joyously commented 1 month ago

you're still not going to get back the untracked file as an untracked file in your working copy.

But is the operation of tracking recorded, and restored when you use the undo? I mean that the track command is undone, so going to a commit prior to that, the file is not tracked? This is like having the ignore file in the repo and a commit that changes it affects everything before and after.

martinvonz commented 1 month ago

Yes, the tracked file will be removed, not untracked when you undo. To clarify, let's take an example of how this might look if we had manual tracking:

$ jj file list
foo
$ jj status
The working copy is clean
$ echo 1 > bar
$ jj status
? bar # let's say we use '?' for untracked as hg does
$ jj track bar
$ jj status
A bar
$ jj undo
Working copy now at: ...
Added 0 files, modified 0 files, removed 1 files
$ jj status
The working copy is clean
$ cat bar
cat: bar: No such file or directory

joyously commented 1 month ago

It seems counterintuitive to remove the file.

martinvonz commented 1 month ago

It seems counterintuitive to remove the file.

I agree, but I don't see what the alternative is. At least the user can undo again and run jj untrack instead if they realize that it wasn't what they wanted

joyously commented 1 month ago

I don't see what the alternative is.

The undo is only local, is that correct? But the track is global, and once one is in effect unless untracked? If that is the case, someone else can track a file I have not tracked, I fetch that which overwrites my file, so I undo, and my file is gone?

Edit: oops, I meant to address the quote by saying to instead do the intuitive thing, but my example maybe means there is no way to do that.

martinvonz commented 1 month ago

If that is the case, someone else can track a file I have not tracked, I fetch that which overwrites my file, so I undo, and my file is gone?

The fetch (and even an update afterwards) does not overwrite the file. Here's another example:

$ jj file list
foo
$ echo 1 > bar
$ jj track bar
$ jj branch -m commit-with-bar
$ jj new @-
$ echo 2 > bar
$ jj status
? bar
$ jj edit commit-with-bar # let's say commit-with-bar has commit id 5ada929e5d2e
Added 1 files, modified 0 files, removed 0 files
Warning: 1 of those updates were skipped because there were conflicting changes in the working copy.
Hint: Inspect the changes compared to the intended target with `jj diff --from 5ada929e5d2e`.
Discard the conflicting changes with `jj restore --from 5ada929e5d2e`.

joyously commented 1 month ago

OK, it's not overwritten, but it's tracked and I didn't track it. If I undo after that warning, is the file deleted?

martinvonz commented 1 month ago

Yes. This is the case I was trying to describe earlier

marc-h38 commented 1 month ago

Again, this does not always work; not for every project or usage. Scroll up for examples.

I applied my own advice to myself, went through the whole issue again and I admit the examples given so far were pretty vague and possibly difficult to picture for tidy people, especially if you think littering the working copy is a bad habit not worth discussing. So I crafted a list of actual filenames, all inspired from real world experience. With hindsight I think someone in the club of "dirty" people should have done this much earlier: it could have saved some time, comments and misunderstandings.

Of course these filenames are much shorter and much more cryptic in "real life" but their purpose and name randomness are the same.

less_optimized_build_is_it_slower/
debug-build-g3dbf325/
weekly_stress_test.logs/
stress_test_20240891T1200.logs/
Robert_3rd_workaround_attempt.patch
openssl_test_key_for_X
short_todo_list_for_bug2345.txt
temporary_script_for_git_bisect_run_for_bug2345.py
hack_test_sequence_that_might_reproduce_the_failure_faster_on_suspect_commits.bash
crashdumps_from_bug2345/
firmware_images_from_customer_X/
some_project_generates_concurrent_test_logs_with_PID_in_their_name/
some_source_file.backup_file_extension_from_very_rarely_used_tool

etc.

"Auto-stashing" of such temporary files is not a feature: it's a bug because they're useful ACROSS commits and branches.

A) I admire organized people who are in a similar situation but who 1) can categorize all the types of temporary files they will use 2) decide in advance some naming convention for each type (e.g. "build", "logs") and add it to some gitignore file 3) carefully follow their own naming convention. I'm not that organized, rigorous and disciplined. I don't think I'm alone and TBH I'm afraid I don't want to be that organized.

B) As already mentioned by others, I really don't want to move these temporary files out of the working copy and into some ../temp_for_projectX/ or /tmp directory because it's further away, less convenient and because they're 100% related to projectX and the activity happening inside that particular workspace. Such temporary files belong to the working copy - but not to any commit or stash.

C) This last and less important preference may be more personal and less common: I actually don't want "temporary files with random filenames" to be gitignored and hidden from "git status". Because "git status" is the perfect tool to list those files, to remind me their bad filename that I just forgot I wrote, to remind me that I forgot to do something with them and to show the scraps on my kitchen counter that I need to clean up when I'm done. Very different from the regular, predictable build/ directory which is better gitignored and hidden. I don't mind if litter stay hidden in "jj status" if that makes things easier for jj as long as I can still reach to "git status" to show it. That's why i suggested some echo '*' >> .git/info/jjexclude above as a better hack than echo '*' >> .git/info/exclude

Once again: I'm suggesting ZERO change to the current, default behavior. Having something "opt-in" would be awesome enough. I don't think anyone remotely suggested to change any default behavior either so please don't answer any non-existent suggestion to change the default behavior: it just makes this already long thread even longer. For instance I believe that any new fear of forgetting to add new files is void: just keep using jj the tidy way as you always have and unsubscribe from this issue!

Hope these examples help tidy people understand us the dirty ones.

marc-h38 commented 1 month ago

... temporary files with random names ... Because there is a conceptual difference that no tool can automatically figure out; so they must be told.

That's true for jj too, but you currently have to use .gitignore to tell it.

I think pretty much every version control system invented so far besides jj supports has been supporting not just two but three types of files:

tracked
(git)ignored ( *.o, build/, *.backup, ...)
Temporary, dirty files with random names, see examples above. Not (git)ignored.

jj drops type 3. For tidy people this is a non-event because they never used 3. But for dirty people this is a showstopper (for now).

martinvonz commented 1 month ago

@marc-h38: Thanks for explaining with some concrete examples. Hopefully that helped explain it to anyone who was still against the feature. I haven't seen any arguments against the feature (I've only heard some people say that they wouldn't personally use it). If anyone is concerned that it complicates the design or the conceptual model too much, please say so. What do you think, @yuja?

Are you interested in implementing this feature, @marc-h38?

joyously commented 1 month ago

Is this one of those options that makes a command work differently for different users? What is the definition of options that can do that and options that are unwanted because they do that?

Given the explanation of a "dirty" workflow, it seems answered in the FAQ.

For example, you could add scratch/ to ~/.git/ignore and then store arbitrary files in <your-git-repo>/scratch/.

martinvonz commented 1 month ago

Is this one of those options that makes a command work differently for different users? What is the definition of options that can do that and options that are unwanted because they do that?

I don't feel strongly myself. Maybe the main problem with aliases is that you can redefine a command to behave in ways that are not defined by any config option (other than the alias definition itself). For example, if we get a bug report saying that some branch disappeared and the user said they did jj status, it would make no sense until they told us that they had redefined status to be jj fetch && jj status or something.

martinvonz commented 1 month ago

It actually looks pretty easy to add at least a basic version of this feature. I'll probably send a PR for that this weekend or next weekend (trying to focus on work priorities during the week).

yuja commented 1 month ago

I haven't seen any arguments against the feature (I've only heard some people say that they wouldn't personally use it). If anyone is concerned that it complicates the design or the conceptual model too much, please say so. What do you think, @yuja?

I'm not against this feature. I personally wouldn't want to see random junks in git status, so I would try to find some patterns to ignore these, but I understand we need a way to notice "mis"-ignored files.

I think people agreed that jj track (or jj file track) is needed. How would we implement the mechanism to not auto-track?

binary config knob snapshot.auto-track = false?
jj-specific version-controlled ignore file .jjignore?
jj-specific core.excludesFile-like config?
positive list to auto-track snapshot.auto-track = "glob:'**/*'"?

martinvonz commented 1 month ago

positive list to auto-track snapshot.auto-track = "glob:'**/*'"?

This is what I'm going for (with a default of all())

martinvonz commented 1 month ago

The draft in #4338 seems to work (though I've only written a single test case so far). I need to add more tests and also make jj status report untracked paths.

necauqua commented 4 weeks ago

Oh my god I've been busy and missed all of this.

I don't follow this part.

What I've been saying, and I might be wrong as I was working purely on assumptions without actually looking at code, is this:

The snapshotting process at some point looks at the working dir to snapshot the files, right. With "not autotracking new files" option enabled, it will check with @ to snapshot only the file modifications/removals for files already present in @, with some form of jj track command being there to actually add new files to @.

I'm pretty sure the #4338 does not do what I wanted, unless I'm misunderstanding, as described it seems like it disables tracking entirely for configured paths - which is a feature, of course, just different to what I meant

My point was that we could have a balance between having to track every single thing and the ease of autotracking, by only requiring to manually track adding new files (might've been better described by my little pseudo-algorithm above).

necauqua commented 4 weeks ago

Looking just a bit more into the #4338 it repeatedly uses wording "track new files that match the pattern", so may be it does exactly what I wanted and I indeed misunderstood :upside_down_face:

And I could even have everything in src/ auto-tracked which is even better

martinvonz / jj