Closed ldrumm closed 3 weeks ago
@llvm/pr-subscribers-llvm-support @llvm/pr-subscribers-backend-spir-v @llvm/pr-subscribers-backend-directx @llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-hlsl @llvm/pr-subscribers-mlir-shape @llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-lld-coff @llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-pgo @llvm/pr-subscribers-mlir-gpu @llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang-tools-extra @llvm/pr-subscribers-testing-tools @llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-clangd
@llvm/pr-subscribers-lldb
Author: None (ldrumm)
@llvm/pr-subscribers-backend-x86
Author: None (ldrumm)
I'm sure there are lots of people that will want to look at this, so please bring in reviewers at will
:warning: C/C++ code formatter, clang-format found issues in your code. :warning:
While I'm sure this commit will manage to break something unexpectedly, I think this is reasonable to do! Thanks for taking this on.
I think it makes sense to remove carriage returns on checkin, but I'm not sure it makes sense to add them back on checkout on Windows. Historically, it's been really easy to write LLVM tests that fail when the source is checked out with CRLF. Many Windows developers have noted that LLVM tests fail when using the git config options to accomplish this. So, is there a way to do CRLF -> LF on checkin, but do nothing on checkout?
I think all modern editors on Windows can read LF files, it's just that they tend to prefer to write files out in CRLF, and that's the main source of friction.
As a data point, I've been setting core.autocrlf=true
on Windows for years. I've been trained to make sure new files have LF endings (usually I copy an existing file instead of creating a new file) but both Notepad and the Visual Studio editor are willing to preserve the line-ending mode of the file as they found it. (This was not always true, but is these days.)
So, is there a way to do CRLF -> LF on checkin, but do nothing on checkout?
To be clear, this may do nothing on checkout if a user has set a config option. The point of the present policy is not to control local preferences as far as the filetype isn't required to have a specific format; we only control what ends up in history.
As a data point, I've been setting core.autocrlf=true on Windows for years.
If that's the case, my information is stale, which I accept.
To be clear, this may do nothing on checkout if a user has set a config option.
That addresses my concerns.
LGTM overall, but the CI failure looks real.
LGTM overall, but the CI failure looks real.
Failed Tests (3):
Clang :: CXX/drs/dr23xx.cpp
Clang :: CXX/drs/dr6xx.cpp
Clang :: Lexer/raw-string-dlim-invalid.cpp
None of these files are in the diffstat, and none of them include any headers. I'm a bit stumped. Is it possible this is spurious?
I think .natvis files should be CRLF (those are used with Visual Studio for debug visualizers).
Agreed, Visual Studio should handle this correctly, but who knows 🤷🏼
On Mon Mar 25, 2024 at 1:48 PM GMT, Tobias Hieta wrote:
I think .natvis files should be CRLF (those are used with Visual Studio for debug visualizers).
Agreed, Visual Studio should handle this correctly, but who knows 🤷🏼
Thanks. I'll add a rule and update
Philosophically I agree with this change, but I think some subprojects may need to adopt specific policies about how they handle files that are expected to be specific line endings. Specifically, Clang needs testing that verifies correct behavior on both CRLF and LF files in some cases, as does LLVM's split-file.
I did a pass a few years ago fixing a bunch of tests that had hard-specified file offsets in them that broke if you had CRLF line endings, and fixing how the correct line ending type was detected (some of the changes are 13fa17db3a720d149bcd0783856347a4f09cf634, 9d3437fbf3419502351d41ff9e28f06b0c3f06e8, ea836c880abcc74d3983fc35de407d647f0a002d, 1c1b0027e86fab2b0877488abc1625a457ca70b3).
Should we come up with an expected filename format that we can use with gitattributes and rename the existing files rather than just listing each file that has special needs by itself?
Should we come up with an expected filename format that we can use with gitattributes and rename the existing files rather than just listing each file that has special needs by itself?
Either way, the author has to do or know something. I'm not a fan of "magic" like this (notwithstanding the use of file extensions for common win32 formats here), so I'd personally just prefer that the tests that require knowledge of line endings are manually tracked and documented in the gitattributes file.
For those files in the repository that do need CRLF endings, I've adopted a policy of eol=crlf which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF.
This ^ seems a bit problematic to me, though (where the contents of the repo isn't representative of the bits we want - but perhaps it's schrodinger's line endings & it doesn't matter if it's always checked out as crlf - though if we one day converted to another source control system, would that be a problem? are there some things that examine the underlying/"real" repo contents?) - what would be the cost to using eol=input
as you've done for the mixed line ending case? below \=
There also appears to be a single test, clang/test/Frontend/rewrite-includes-mixed-eol-crlf.[ch] which needs mixed line endings. This one uses eol=input to preserve it as-is.
I don't have a strong opinion, but fundamentally I would prefer if the source control system stored exactly the files I have in my checkout, not mess with them in any way. I understand there are practical concerns, but a linter for unexpected CRLF would maybe be an option?
I don't have a strong opinion, but fundamentally I would prefer if the source control system stored exactly the files I have in my checkout, not mess with them in any way. I understand there are practical concerns, but a linter for unexpected CRLF would maybe be an option?
That wish is fine until you start working with others. Then you get 50000 line diffs that are somebody changing a single line on their system, because they happen to use the opposite system to the last person to check code in. I think the best middle ground here is to say -text
or eol=input
for files that must be encoded a certain way.
As for linters: linters require people to run it, and we still have to encode those rules somewhere - just not gitattributes. As far as I can see this is the lowest overhead and most reliable pragmatic option that has the nice property of being fully integrated into the tooling, transparent to end users, and in a known (semi) central location.
That wish is fine until you start working with others.
Do we actually have that little faith in developers that we think they will check in a 50k line diff?
That wish is fine until you start working with others.
Do we actually have that little faith in developers that we think they will check in a 50k line diff?
depending on their diff settings, or the web interface they use, it may not show until they actually look at a hex editor. The point of this patch is not to lambast developers or interfere with their local setups; it's to get the line-ending issues out of the way for good so they can focus on what they do best.
And, given what I quoted above, it's not about faith - it's about historical evidence that this is a problem.
. The point of this patch is not to lambast developers or interfere with their local setups; it's to get the line-ending issues out of the way for good so they can focus on what they do best.
Fair enough. I don't think it will fully make them go away for good, as you mentioned "[...] except for specific cases like .bat files or tests for parsers that need to accept such sequences." Something somewhere is bound to work before the transformation, and no longer after. It's possible that that will be more rare, though I would say 100 reverts in all of LLVM history isn't really that much either, all things considered.
And, given what I quoted above, it's not about faith - it's about historical evidence that this is a problem.
I am not saying this isn't a problem at all, but how often has anyone done a one line change and caused a 50k diff, and submitted it without noticing?
I am not saying this isn't a problem at all, but how often has anyone done a one line change and caused a 50k diff, and submitted it without noticing?
Way more often than you would hope. I don't have numbers but if you search through the git history for crlf
or dos2unix
those phrases hit on a lot of commits (more than just my abolishcrlf.org email address).
This ^ seems a bit problematic to me, though (where the contents of the repo isn't representative of the bits we want - but perhaps it's schrodinger's line endings & it doesn't matter if it's always checked out as crlf - though if we one day converted to another source control system, would that be a problem? are there some things that examine the underlying/"real" repo contents?) - what would be the cost to using
eol=input
as you've done for the mixed line ending case? below \=
The argument I would have in favor of an explicit value rather than eol=input
is that it at least gives us a source of truth in the .gitattributes
as to what type of line ending the file is supposed to have.
I don't have a strong feeling one way or another. I just want autocrlf=true
to work reliably on all platforms.
This ^ seems a bit problematic to me, though (where the contents of the repo isn't representative of the bits we want - but perhaps it's schrodinger's line endings & it doesn't matter if it's always checked out as crlf - though if we one day converted to another source control system, would that be a problem? are there some things that examine the underlying/"real" repo contents?) - what would be the cost to using
eol=input
as you've done for the mixed line ending case? below =The argument I would have in favor of an explicit value rather than
eol=input
is that it at least gives us a source of truth in the.gitattributes
as to what type of line ending the file is supposed to have.
Missed this first time round. Apologies.
To address the first part: I'm not sure I'm equipped to deal with the philosphy matters here, though it might make more practical sense to use -text
for the mixed line endings - actually eol=input
might be wrong where we're testing mixed file endings; I'll have to check.
As for the second point: I don't think it's productive to worry about other future source control systems at this point - or indeed interoperability. Tools which do translation and don't respect gitattributes are buggy, and authors that use them and somehow manage to get bad bytes in llvm.org's git history will really have to try hard to do so, and we'll be back to another fixup commit. I don't really see this a problem worth worrying about.
I don't have a strong feeling one way or another. I just want
autocrlf=true
to work reliably on all platforms.
I'm a bit confused. Are you saying that as I've expressed it, autocrlf=true
won't work correctly? I think you're saying you're in favour of this patch in principal, but I need to fix the mixed line endings case?
I'm a bit confused. Are you saying that as I've expressed it,
autocrlf=true
won't work correctly? I think you're saying you're in favour of this patch in principal, but I need to fix the mixed line endings case?
Sorry for being unclear. I meant to express that I'm so strongly in favor of what you're accomplishing here that I don't want any of my feedback to stand in the way.
Philosophically, I agree with this change. Enshrining the information about the line endings into the SCM tool makes sense.
I think that the concern that I have is that do we have sufficient testing for supporting line-ending dependent behaviour in the compiler? Additionally, do we have sufficient documentation for others to figure out how to ensure that git does not munge the line endings if they are introducing a test which is dependent on it? In such a case, how do we ensure that they are aware that the SCM will do so without actually checking the post-commit state with a hex editor?
I've added the natvis changes and added a couple of clangd test exceptions.
For the sanity of pending reviewers, and to make rebasing this trivial I've split this into two commits: the gitattributes changes, followed by the --renormalize
fixup.
For downstreams, I'm considering keeping this as two separate commits (rather than a fixup) so that merge conflicts can be resolved the same way. Anyone have opinions on this?
@compnerd I just realised I didn't respond to your concern. Apologies.
I think that the concern that I have is that do we have sufficient testing for supporting line-ending dependent behaviour in the compiler?
For the first part: I don't know that it matters, since this patch changes how the LLVM source code is stored, but not how its parsers operate. In any case, we run parser testing on LF and CRLF systems since the precommit testing runs on Linux, and Win32 at least. In addition, and there are several tests dedicated to line ending, so I think we should be good there. Happy to tend to this and fix anything I've missed if someone lets me know something is broken after we merge.
As for the second part
Additionally, do we have sufficient documentation for others to figure out how to ensure that git does not munge the line endings if they are introducing a test which is dependent on it? In such a case, how do we ensure that they are aware that the SCM will do so without actually checking the post-commit state with a hex editor?
I don't think we do, but also this patch doesn't really change the problem since right now basically anything can happen due to local configuration overrides (~/.gitconfig
). I'll add a comment to the testing infrastructure page to be mindful of how line endings are storedand link to the .gitattributes
documentation. Does that sound enough?
I'll add a comment to the testing infrastructure page to be mindful of how line endings are storedand link to the .gitattributes documentation. Does that sound enough?
Added here
changes to clang/test/CXX/lex/lex.literal/lex.string/p4.cpp
should be reverted (it's a CRLF related test)
@compnerd I just realised I didn't respond to your concern. Apologies.
I think that the concern that I have is that do we have sufficient testing for supporting line-ending dependent behaviour in the compiler?
For the first part: I don't know that it matters, since this patch changes how the LLVM source code is stored, but not how its parsers operate. In any case, we run parser testing on LF and CRLF systems since the precommit testing runs on Linux, and Win32 at least. In addition, and there are several tests dedicated to line ending, so I think we should be good there. Happy to tend to this and fix anything I've missed if someone lets me know something is broken after we merge.
I don't know if the pre-commit testing guarantees that. Configuration settings will permit the files to be checked out in either Unix (\n
) or Windows (\r\n
) line-endings. If the builders are all configured to run in Unix line endings we lose that coverage. While I understand that this change only changes how the LLVM sources are stored, the issue is that the LLVM sources include the tests directory. These need to be tested in various manners to ensure that we test how we handle the different inputs (in clang). One option might be to exclude the tests directory from the line ending conversion if we cannot verify that we have tests in both formats.
As for the second part
Additionally, do we have sufficient documentation for others to figure out how to ensure that git does not munge the line endings if they are introducing a test which is dependent on it? In such a case, how do we ensure that they are aware that the SCM will do so without actually checking the post-commit state with a hex editor?
I don't think we do, but also this patch doesn't really change the problem since right now basically anything can happen due to local configuration overrides (
~/.gitconfig
). I'll add a comment to the testing infrastructure page to be mindful of how line endings are storedand link to the.gitattributes
documentation. Does that sound enough?
I think that the documentation should be good. While, yes, it is possible to get the wrong behaviour currently, if the user configures git explicitly for a line-ending, I would expect them to be aware of that. The use of .gitattributes
means that their defaults are not necessarily honoured and thus this can catch them by surprise.
changes to
clang/test/CXX/lex/lex.literal/lex.string/p4.cpp
should be reverted (it's a CRLF related test)
This is the class of problems that I am concerned about. We certainly have some tests which are line-ending sensitive, and each test should be audited before we make such a change.
Also, it's a bit funny to have .bat files without CRLF endings given that they run on Windows.
I'm not sure about the funny bit - but certainly dangerous. I've had cmd misinterpret batch files with LF vs CRLF.
Also, it's a bit funny to have .bat files without CRLF endings given that they run on Windows
They do have CRLF line endings: https://github.com/llvm/llvm-project/pull/86318/commits/1994c29731fde75f075c0605b79a14667bcfb9ac#diff-618cd5b83d62060ba3d027e314a21ceaf75d36067ff820db126642944145393eR3
Note, the changes to clang-format-vs should likely all be reverted. .sln is a Microsoft Visual Studio solution file, as are many of the others
Right. .sln is a text file: https://learn.microsoft.com/en-us/visualstudio/extensibility/internals/solution-dot-sln-file?view=vs-2022
I have no idea whether Visual Studio will be happy with .natvis files having non-Windows line endings.
They have CRLF line endings: https://github.com/llvm/llvm-project/pull/86318/commits/1994c29731fde75f075c0605b79a14667bcfb9ac#diff-618cd5b83d62060ba3d027e314a21ceaf75d36067ff820db126642944145393eR6
Perhaps things have got lost during the discussion, but this part of my original commit message is perhaps worth re-reading:
In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but checkout them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default.
For those files in the repository that do need CRLF endings, I've adopted a policy of eol=crlf which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF.
I don't know if the pre-commit testing guarantees that. Configuration settings will permit the files to be checked out in either Unix (
\n
) or Windows (\r\n
) line-endings.
Today on Windows you basically have to check out LLVM as unix line endings. There are a bunch of tests that fail if you don't. Hopefully this is a step toward fixing that (which is why I'm in favor of this).
If the builders are all configured to run in Unix line endings we lose that coverage.
I suspect the Windows bots are basically all configured that way today. Our instructions tell people they need to disable autocrlf
(see: https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm), and not disabling it causes problems for users still (see this message from today: https://discord.com/channels/636084430946959380/745925509971705877/1235913415902629958).
While I understand that this change only changes how the LLVM sources are stored, the issue is that the LLVM sources include the tests directory. These need to be tested in various manners to ensure that we test how we handle the different inputs (in clang). One option might be to exclude the tests directory from the line ending conversion if we cannot verify that we have tests in both formats.
I see two issues with this:
1) because the repo doesn't work with autocrlf today, we actually can't rely on testing on different platforms to provide this coverage. 2) We shouldn't be relying on git settings to get this coverage anyways. We should have tests that require specific line endings, and we should use gitattributes to ensure that we run those tests regardless of the user's git settings.
While, yes, it is possible to get the wrong behaviour currently, if the user configures git explicitly for a line-ending, I would expect them to be aware of that. The use of
.gitattributes
means that their defaults are not necessarily honoured and thus this can catch them by surprise.
I think today not only is it possible to get the wrong behavior, if you're developing on Windows Git's default behavior causes the wrong behavior for LLVM. Which often means that contributors (especially new contributors) have no idea that they're doing something wrong.
I don't know that this PR is perfect, but IMO it is a huge step in the right direction to allow Windows contributors to build, test and contribute to LLVM using the default configurations of the OS and tools.
There are a bunch of tests that fail if you don't.
Yes. Even better than that: this patch just uncovered a legitimate bug in the C++ AST parser tests for clang
Our instructions tell people they need to disable autocrlf
I didn't see that. Shall I remove that instruction now, or after we've merged?
I don't know that this PR is perfect, but IMO it is a huge step in the right direction to allow Windows contributors to build, test and contribute to LLVM using the default configurations of the OS and tools.
Absolutely. This patch is not a panacea but it enables us to correctly formalize the case where we actually do care about line-endings. Before this patch, it was basically incredibly brittle, but now we actually have control for those tests that matter. For things that don't really matter a whole class of irritating paper-cuts go away forever.
I understand folks' hesitance around downstream rebases, but since this will go in as two commits, resolving a merge conflict becomes an easy step (I've given instructions in the commit that will cause conflicts). I've bumped downstream llvm projects through many major releases in my life (llvm v3 through 13), and change-of-method-name has been a much greater source of merge conflicts than whitespace. LLVM's codebase is healthy and clean largely because we give people the ability to making sweeping improvements. In any case, this will be the last time that a downstream will need to resolve major CRLF changes during a bump, so I consider that a long-term win too.
I've let this patch stew for long enough, and I think it's now time. The spurious test failures in files unchanged by this patch have gone away. After re-reading the discussion above I'm ready to merge these changes.
There's been a good discussion, which helped me polish this a fair bit, and a few contentions which needed clarifying or ironing out. The main contention seems to have been that what's stored in git's database is always LF - even for for windows users or files that need CRLF endings (e.g. .bat
). As I pointed out in a few comments (e.g.1,2,3) it may be unintuitive to store everything in git's internal database with normalized LF, but it's how git checks them out that's important, and I believe I've ironed out any issues here by tracking down files and testcases that depend on a particular line-ending style and adding rules for them.
For those who had such concerns, I believe I've clarified the principal and pointed out the practicalities that will make life easier for windows and unix contributors.
To summarize:
No user should need adjust their local config as a result of this change
I'll keep an eye on this, and will be happy to react to any issues that arise over the next week or so.
Thanks for all the input.
This breaks a number of tests on Windows.
Previously, to have tests working on Windows, one would do git config --global core.autocrlf false
or similar, before checking out llvm-project - a number of test input files need to be in LF form to work. This was brought up earlier already by @llvm-beanz in https://github.com/llvm/llvm-project/pull/86318#issuecomment-2093160376.
Now after this change, due to the added .gitattributes
which overrides the core.autocrlf
setting, these files get checked out with CRLF newlines (as the native form for the platform).
Based on the comment in the .gitattributes file, it seems like this is both known and intentional behaviour:
# Checkout as native, commit as LF except in specific circumstances
* text=auto
While it was already stated that blanket checkouts with CRLF will fail.
Additionally, the old mechanism of getting working newlines in the files is suddenly broken.
To make things worse, you won't notice this thing if you're updating an existing workdir - files that aren't touched aren't rewritten. (To trigger re-checkout of files to get .gitattributes
applied, one can do something like git rm -r subdir && git reset && git checkout subdir
.) So I guess most buildbots will keep chugging along fine, until someone pushes changes that touch those files. If running with a fresh checkout of llvm-project, this has a much bigger impact.
For compiler-rt tests, a handful of the profile tests depend on the right line endings - I can push a local .gitattributes
file to fix that (see https://github.com/mstorsjo/llvm-project/commit/99bec81c87dcd2b7a7970954882bc0e42239d381).
But the Clang, clang-tools-extra and LLVM testsuites have many tests that are broken - there are around 80 tests failing due to this; sorting that out is a much bigger task that I'm not volunteering to take on right now.
CC @AaronBallman as the majority of those failing tests are in Clang.
Can we revert this until we figure out these bits? Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)? And other than that, we do need to tag the files that rely on being in LF form so that things work on Windows with either setting.
There are a couple of things to unpack here
a number of test input files need to be in LF form to work
Which ones? Either there's a bug in a parser somewhere, or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think.
Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform).
It's my understanding that text=auto
does not override core.autocrlf
. As far as I can tell from the documentation it honours the user's configuration for core.eol
in combination with core.autocrlf
- from git config --help
:
core.eol Sets the line ending type to use in the working directory for files that are marked as text (either by having the text attribute set, or by having text=auto and Git auto-detecting the contents as text). Alternatives are lf, crlf and native, which uses the platform’s native line ending. The default value is native. See gitattributes(5) for more information on end-of-line conversion. Note that this value is ignored if core.autocrlf is set to true or input.
and
core.autocrlf Setting this variable to "true" is the same as setting the text attribute to "auto" on all files and core.eol to "crlf". Set to true if you want to have CRLF line endings in your working directory and the repository has LF line endings. This variable can be set to input, in which case no output conversion is performed.
Can we revert this until we figure out these bits
Sure, but like I said, I'm happy to fix these broken cases. The old configuration was broken, but not in a controllable way, so I think it's reasonable to fix up the broken tests and move forward. Perhaps we also need clearer documentation?
This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form:
warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it
Can we revert this until we figure out these bits?
Yes, please.
Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)?
It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows; it was very jarring to get hundreds of warnings from git that all seemed unactionable.
a number of test input files need to be in LF form to work
Which ones?
A whole bunch of them. @AaronBallman's link to https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab shows mostly what I saw. If including clang-tools-extra
and llvm
in the set of tests to run, I have failures in the following tests as well:
Clang Tools :: clang-move/move-used-helper-decls.cpp
LLVM :: MC/ELF/warn-newline-in-escaped-string.s
LLVM :: TableGen/x86-fold-tables.td
LLVM :: tools/llvm-rc/tag-html.test
lit :: shtest-shell.py
Either there's a bug in a parser somewhere,
In the vast majority of cases, it's not a bug in a parser, but a test that relies on the exact contents of the source files. E.g. for tools/llvm-rc/tag-html.test
I would guess that the issue is that the test takes a text file (which now has variable line endings) and includes it in a binary resource file, and checks the output to match bitwise the expected output.
In my nightly run of compiler-rt tests https://github.com/mstorsjo/llvm-mingw/actions/runs/11395494568/job/31717433112, I had the following failures:
Failed Tests (5):
Profile-x86_64 :: instrprof-gcov-exceptions.test
Profile-x86_64 :: instrprof-gcov-multiple-bbs-single-line.test
Profile-x86_64 :: instrprof-gcov-one-line-function.test
Profile-x86_64 :: instrprof-gcov-switch.test
Profile-x86_64 :: instrprof-gcov-two-objects.test
I think the issue here may be something around testing the exact amount of whitespace somewhere or so.
Mostly "brittle" tests that don't expect the source files to vary.
or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think.
Did you do a test run on Windows? Even on Linux, I guess it should be possible to check out the code, forcing Git to prefer checking out text files as CRLF, so you could experience the same amount of fallout.
Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform).
It's my understanding that
text=auto
does not overridecore.autocrlf
. As far as I can tell from the documentation it honours the user's configuration forcore.eol
in combination withcore.autocrlf
- fromgit config --help
:
This doesn't match my experience.
See https://github.com/mstorsjo/llvm-project/commit/inspect-newlines for a test github actions job that shows checking out the repo on Windows. First we check out an old branch, with default settings - we get CRLF. Then we set core.autocrlf=false
and retry the above, we get a file with LF newlines. Then we check out the new version with gitattributes, and we notice how we now suddenly are getting CRLF again, despite our setting. See https://github.com/mstorsjo/llvm-project/actions/runs/11407224268 for the actual run log.
Feel free to play around with such an action, to come up with a better way of checking it out while still getting LF newlines - but this is the way I have been doing it (which is scripted in a number of places), and I would expect that many others have the same setup.
This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form:
warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it
Right, that probably relates to the odd situation where files are checked out in one form, but the git attributes indicate that they should be treated differently. Getting git to rewrite the files on disk to match it is kinda messy actually - the best way I've found today is git rm -r . && git reset && git checkout .
.
Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)?
It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows;
Exactly - people haven't had a problem with having LF newlines so far. The main problem probably has been to set up Git to get it in that form.
I obviously don't mind fixing as many tests as possible to pass with any form of newlines, but having all checkouts of the repo actually have files in identical form, rather than in any fuzzy form, feels like a feature to me.
So keeping the gitattributes, but in a form where it dictates checking out files in LF form (which has required setting core.autocrlf=false
so far) would IMO be preferrable.
Would we get there by setting the wildcard rule in .gitattrubtes
to * text eof=lf
, or something along those lines?
I believe Chrome is also seeing many test failures due to this (https://crbug.com/374115887), although I haven't yet confirmed it's due to this specific commit.
I just had someone in my office hours also running into problems from this commit. I went to revert the changes myself and I cannot because of merge conflicts... due to line endings.
@ldrumm -- can you revert these changes ASAP? They're causing significant problems in practice, so best to get us back to green rather than fix forward. Thanks!
On Fri Oct 18, 2024 at 7:39 PM BST, Aaron Ballman wrote:
@ldrumm -- can you revert these changes ASAP? They're causing significant problems in practice, so best to get us back to green rather than fix forward. Thanks!
Reverted.
Thanks for reverting!
I must have missed this PR originally. I oppose letting Git change any line endings. It always ends like this.
It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help:
This doesn't match my experience.
I think this is due to a subtly of config. Setting core.autocrlf
to false
doesn't actually do anything since it's the default. In that case git is still in "no opinion" mode - which means it stores the input line endings and does no conversion.
However, once eol=auto
is set in a .gitattributes
, it forces git to use the configured eol config:
static int text_eol_is_crlf(void)
{
if (auto_crlf == AUTO_CRLF_TRUE)
return 1;
else if (auto_crlf == AUTO_CRLF_INPUT)
return 0;
if (core_eol == EOL_CRLF)
return 1;
if (core_eol == EOL_UNSET && EOL_NATIVE == EOL_CRLF)
return 1;
return 0;
}
static enum eol output_eol(enum convert_crlf_action crlf_action)
{
switch (crlf_action) {
case CRLF_BINARY:
return EOL_UNSET;
case CRLF_TEXT_CRLF:
return EOL_CRLF;
case CRLF_TEXT_INPUT:
return EOL_LF;
case CRLF_UNDEFINED:
case CRLF_AUTO_CRLF:
return EOL_CRLF;
case CRLF_AUTO_INPUT:
return EOL_LF;
case CRLF_TEXT:
case CRLF_AUTO:
/* fall through */
return text_eol_is_crlf() ? EOL_CRLF : EOL_LF;
}
warning(_("illegal crlf_action %d"), (int)crlf_action);
return core_eol;
}
output_eol
is the git function that decides to write out a file with CRLF or LF endings
Notice that now we hit the CRLF_AUTO
case so it's text_eol_is_crlf() ? EOL_CRLF : EOL_LF;
text_eol_is_crlf()
checks against core.autocrlf
. Since you've stated that it's false
, then it then it checks core.eol
.
So if I've read that correctly, core.autocrlf=false
is a red herring and you should really set core.eol=lf
if you want git to use lf
on windows.
Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines?
This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another
It always ends like this.
Ends Like what? As far as I can see all this has done has exposed latent bugs in our testing and in clang's parser
Historically, we've not automatically enforced how git tracks line endings, but there are many, many commits that "undo" unintended CRLFs getting into history.
git log --pretty=oneline --grep=CRLF
shows nearly 100 commits involving reverts of CRLF making its way into the index and then history. As far as I can tell, there are none the other way round except for specific cases like.bat
files or tests for parsers that need to accept such sequences.Of note, one of the earliest of those listed in that output is:
commit 9795860250734e5c2a879546c534e35d9edd5944 Author: NAKAMURA Takumi geek4civic@gmail.com Date: Thu Feb 3 11:41:27 2011 +0000
...which introduced such a defacto policy for subversion.
With old versions of git, it's been a bit of a crapshoot whether enforcing storing line endings in the history will upset checkouts on machines where such line endings are the norm. Indeed many users have enforced that git checks out the working copy according to a global or per-user config via core crlf, or core autocrlf.
However, for ~8 years now[1], however, git has supported the ability to "do as the Romans do" on checkout, but internally store subsets of text files with line-endings specified via a system of patterns in the gitattributes file. Since we now have this ability, and we've been specifying attributes for various binary files, I think it makes sense to rid us of all that work converting things "back", and just let git handle the local checkout. Thus the new toplevel policy here is
In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but checkout them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default.
For those files in the repository that do need CRLF endings, I've adopted a policy of
eol=crlf
which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF.Finally, existing files have been "corrected" in history via
git add --renormalize .
[1]: git 2.10 was released with fixed support for fine-grained line-ending tracking that respects user-config and repo policy. This can be considered the point at which git will respect both the user's local working tree preference and the history as specified by the maintainers. See https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248 for the release note.