The Hard Tabs Issue - Githubissues

basmith commented 7 years ago

Hi,

(This report is based on the v0.1.1 Win64 binary artifact from the Zig website.)

I noticed that if I create a Zig source file in Windows with a native editor (eg Notepad), the compiler complains about line endings:

$ zig build-exe hello.zig
':\code\zig\first\hello.zig:1:30: error: invalid character: '
const io = @import("std").io;
                             ^

If I manually kill the newlines (resulting in the code being all on one line) it compiles.

I tried using Vim in a Cygwin shell and the file it wrote also compiled without complaint (presumably Unix-style newlines, as Notepad renders that file on one line while Vim looks correct).

thejoshwolfe commented 7 years ago

You need to configure your editor to use unix line endings to write Zig code. Additionally, you need to configure your editor to use spaces instead of hard tabs for indentation.

notepad.exe has neither of these features, and it can't even comprehend unix line endings. This has been a long standing bug/missing feature in the windows default plain text editor. Notepad is in fact so deficient as a text editor, that literally every single other text editor in popular use today can comprehend unix line endings. Notepad is the worst text editor in popular use, and has been for decades. Zig will not bend to accommodate Microsoft's gross incompetence or nefarious stunts in their inability or unwillingness to provide a decent default text editor to their paying consumer base. Notepad is the problem here, not Zig. (It's not just me. Here's other angry people complaining about Notepad.)

The rationale for only supporting unix line endings and no hard tabs is part of the "only one obvious way to do things" philosophy. From a practical perspective, never having windows line endings makes it easier to write tools that read zig source files. For example, a tool that searches for "\n\n// TODO" and replaces it with something else that includes newlines: it's much easier to do this without worrying about newline style. Furthermore, git and svn have strange features that convert newline styles at odd times, and now all that's irrelevant for Zig.

Variable newline style and variable indentation style are features that Zig does not support.

Is this documented anywhere, or are users just expected to run into cryptic errors like this? Like a CR character doesn't even print properly in a terminal.

hasenj commented 7 years ago

I'm not sure if Zig does this on purpose or it was overlooked, but as far as I'm concerned this is a good feature. Different styles of indentation and line endings cause endless headaches when working on a collaborative project, for example using source control software such as git, etc.

I know notepad is the default text editor in windows, but nearly all developers use something else to write code, such as notepad++ or visual studio.

VS Code is also a good option. It's developed by Microsoft, and it's free.

PavelVozenilek commented 7 years ago

Nim does it right:

Any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.

Multiline string should insert LF newlines, as in C. If someone wants CR he could add it via \r.

Correct newlines are is not just problem of stupid Notepad: if one copy pastes example from webpage he gets CR/LF too. Imagine someone failing with Hello World.

thejoshwolfe commented 7 years ago

if one copy pastes example from webpage he gets CR/LF too.

Pasting into what editor does that?

PavelVozenilek commented 7 years ago

@thejoshwolfe: Notepad, Sublime Text 2 and probably anything else. I do not know Windows editor which by default uses Unix line ending and converts to this style automatically.

thejoshwolfe commented 7 years ago

@PavelVozenilek Are you saying that when an editor is configured to use unix line endings or is editing a file that already has consistent unix line endings, then pasting from a webbrowser inserts the wrong kind of line ending? Or are you just saying that windows line endings are typically the default line ending style before you configure it in your editor?

PavelVozenilek commented 7 years ago

@thejoshwolfe: the editors I know (VC++, Sublime Text 2, Notepad ...) do not have configuration option to force Unix ending everywhere from now. At the best one switch it manually file by file.

Programable editors like vim, probably, but I tend to avoid tools smarter than me.

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

kyle-github commented 7 years ago

@thejoshwolfe: I think it is the browser that does this, not the editor. HTML is defined as using \r\n not \n. Most browsers let you get away with it on input, but when you copy and paste I think it recreates "correct" HTML from the DOM. Not sure about this, but I have run into the problem consistently.

I think @PavelVozenilek has a point. Every useful editor can manage to translate the line endings just fine but few allow you to do it at a project level and change everything automatically. However, the two main platforms, Windows and Mac, do not use the line ending convention that Zig uses. I happen to use Linux, but that is a minority platform.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

thejoshwolfe commented 7 years ago

I just did some experimentation on my Windows machine. Here's what I found:

Eclipse and Notepad++ normalize line endings when you paste text into a file. Each file is determined to be in a particular style, and anything you put into the file through typing or pasting gets normalized to that style.

In Visual Studio, when you press enter, it uses the newline style of the lines around your cursor. When you paste code with CRLF line endings into any file, you get CRLF line endings for the text that you're pasting without affecting the surrounding text. If you save the file, it saves with mixed line endings without warning you. (You can convert line endings while saving through the "Advanced Save Options".) Visual Studio has no option to automatically normalize line endings on paste or on save. If you want normalized line endings, you gotta do it every time you save.

When you open a file in Visual Studio that has mixed line endings, you get a dialog that prompts you to normalize all the line endings to one style or another.

This is not a bug thread for Visual Studio, but that is a Visual Studio bug. Why would they let you create mixed line endings without warning, but then warn you when you open a file that has mixed line endings? This leads to a "best practice" where you should close and reopen all your files before making a commit to make sure you're not committing files that will produce warnings, which is just silly. This is a bug/missing feature in Visual Studio.

I don't know about Sublime Text; it's not free.

Meanwhile in Linux, copying text out of a web browser always seems to result in unix line endings, not windows line endings. I don't know where you're getting the idea that HTML uses windows line endings; I don't see it in the spec. Maybe you mean HTTP headers? There are parts of the HTML spec that talk about normalizing to CRLF, but I can't figure out how to observe that as an end user. I tried copy-paste and drag-drop text from a Google search page and from the textarea editor I'm typing this in right now, but I always got unix line endings (tested in Chrome).

The so-called "Mac"-style line ending style actually refers to pre-2005 "classic" Mac OS-9 line endings. Modern Mac uses Unix line endings, just like Linux.

thejoshwolfe commented 7 years ago

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

This kind of reasoning leads to JavaScript's automatic semicolon insertion. This kind of reasoning has proven to be very successful at getting widespread adoption. This kind of reasoning is also contrary to the Zen of Zig. In Zig, the code author is required to do more work so that code readers are required to do less work.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

Some kind of zig-fmt tool is definitely within the scope of what we want to create. Some plans so far are to make the Zig compiler outright reject any source files that would be modified by zig-fmt. This not only establishes a clear precedent, but forces everyone to use it, or else your code won't compile, not even in debug mode. Whole classes of bikeshedding are gone with this strategy, and all working Zig code has a consistent style. This is already partly the case as we're discussing here, although there's no zig-fmt to fix these problems for you yet. But since the only formatting that's currently rejected is '\r' and '\t' characters, you can pretty trivially clean these up. (You could use this tool, for example.)

kyle-github commented 7 years ago

@thejoshwolfe, thanks for running the experiments on cut-and-paste from browsers. Interesting that you did not get the CRLN combos. It has been a while since I cared to check and I tend to set all my editor tools to use LN only on save. As you note, Visual Studio is perhaps not the example of what to do :-)

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian. For Python this almost works because indentation matters and if someone enters code using both tabs and spaces the meaning is ambiguous.

I think your example of JavaScript's semicolon insertion is taking this a bit far. The semicolon insertion (IMO) is an abomination because it can be wrong and change the intended meaning of the code. I do not the see the same thing with handling CRLN, LN or CR as white space.

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level. Python code is not formatted all the same, but even without a python-fmt tool, the code from different projects has more formatting similarities than code in most other languages. I think van Rossum made a mistake in allowing tabs.

thejoshwolfe commented 7 years ago

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level.

Yes. My idea is to have both C-like curly braces and Pythonic indentation, and they must agree. Curly braces are arguably easier for tools to understand, and indentation is absolutely easier for humans to understand, so I want both. Curly braces enable things that you can't do with just indentation. And as for the compiler enforcing indentation rules, come on, you should always get indentation right; no excuse for wrong indentation; it's not that hard, and it makes a huge difference for readability.

A neat advantage of having strict indentation rules and curly brace block scopes is that you can have better compile errors for unbalanced curly braces, which is something that is especially chaotic in C and Java.

fn SomeClassThing(comptime T: type) -> type {
    struct {
        const Self = this;
        field: T,
        fn method(self: &const Self) {
            {var i = u32(0); while (i < self.field.len) {
                self.field.something(i);
            }
        } // ERROR: missing '}', or wrong indentation
        // At this point, the compiler can trust the indentation
        // rather than the curly braces for parsing the rest of the file.
    }
}

In practice, indentation tends to be more correct than curly brace balance. This is especially relevant for IDE's where the tooling is trying to follow along with you as you type. Unbalanced parentheses, quotes, curly braces, etc. are very common while you're in the midst of typing code. By contrast, wrong indentation is much less common. Usually the indentation is wrong if you past/move a bunch of code at once, and in that case, you can have an IDE hotkey to trust the curly braces and fix the indentation; then everything's back in agreement.

Generally there are two facets to code formatting: readable for tools and readable for humans. C leans toward readable for tools (curly braces, etc.); Python leans toward readable for humans (indentation, etc.); Zig wants to have it both ways, and so has two sets of formatting rules that must be in agreement for your code to compile. (As a reminder, this is an informal plan for a future version of Zig, not status quo.)

Related is #114.

I think van Rossum made a mistake in allowing tabs.

Absolutely agree. It's horrifying how ugly you can make "correct" indentation in Python by mixing spaces and tabs, even in the same line. What a mess.

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian.

I have high hopes for this strategy. We've already seen some people scared away by Zig's decision to not support hard tabs, which is a shame. But on the plus side, all Zig code will be consistent with this kind of design philosophy.

kyle-github commented 7 years ago

@thejoshwolfe, doesn't the use of both curly braces and indentation violate the DRY principle? If one of them is wrong, which one? I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

One of the things I like about Python is that it showed you can have both human friendly and machine friendly syntax at the same time. Parsing Python is not markedly harder than parsing a brace-heavy language. Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

If Zig is to become a useful replacement for C, and I think it has many parts that are very positive, putting too many barriers in the way of adoption could be a problem. The balance that the Go creators did with go-fmt ended up being a pretty good one. Use of go-fmt is not actually required, but your code is going to be heavily criticized and not reused if it isn't used.

I think use of an enforced indentation scheme and providing a tool like zig-fmt would go a very long way to stopping the bike-shedding and help a lot in making all code heavily reusable.

For instance you could simply mandate that all indentation is three spaces per indent level. Fine, 99% of all editors can handle that right now. Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

That said, using indentation as a hint that the programmer missed a curly brace? That would be a good thing. I think some editors may do that now. We catch the misaligned indentation by eye easier than the missing curly braces.

Obviously this is all IMO!

thejoshwolfe commented 7 years ago

doesn't the use of both curly braces and indentation violate the DRY principle?

Yes, and I think this is a good time to violate that principle. DRY taken to the extreme leads to Haskell's complete type inference, which is very hard to read. Information duplication is only a problem because it's more work to do, which Zig is ok with forcing on authors, and because it can create conflicting information:

If one of them is wrong, which one?

When you're trying to compile your code, probably the indentation is right (still a compile error though). When you're trying to autoformat your code, probably the curly braces are right.

I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

It doesn't seem like much to ask of a programmer to get their indentation right before trying to compile their code. I'm always careful to keep my indentation correct, even if when it's not a compile error, because it makes the code easier to read. An error for incorrect indentation would add 0 cognitive load for me, but if you're not used to being careful to keep your indentation correct, perhaps have your zig compile command preceded by a zig-fmt command. This would be similar to Eclipse JDT's option to run the autoformatter while saving Java source files.

Parsing Python is not markedly harder than parsing a brace-heavy language.

Maybe I'm just bad at it, but I find writing indentation-scoped parsers to be much harder than start/end token-scoped parsers.

Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

I still want to consider people creating new tools. There are lots of cases where you'll want to make a machine that reads Zig code, e.g. custom linters, syntax highlighters, even a one-off sed command to do some refactoring. The more constrained the syntax is, the easier it is to write these tools.

Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

Vim can already do this. The = command indents your code based on curly-brace matching even without any installed zig syntax highlighting. It doesn't behave quite correctly in all cases, but it helps.

I don't think curly-braces-to-indentation is an outrageous feature to expect editors to have. And again, I don't think indentation is very difficult to get right manually in the first place.

thejoshwolfe commented 7 years ago

As an example of how easy Zig is to comprehend with tools, here's a perl one-liner that deletes the content of all the free-form text you can find in Zig code (// comments, "strings", \\ strings, 'characters'). After doing this substitution, every { character is part of the structural syntax.

perl -p -e 's/(["'"'"'])([^\\]|\\.)*?\1/$1$1/g; s/(\/\/|\\\\).*/$1/g'

Even if you don't understand that mess, do notice how short it is. You can't get anything near that simple for C/C++/Java/JavaScript/C# (due to multiline comments), Python (due to multiline string literals), JavaScript/Ruby (due to template strings), PHP/Perl (I don't even want to know), etc. This tokenization simplicity is one reason why Zig does not support /* */-style comments. The tokenizer state is always reset on a newline.

And by newline, I mean '\n', not /\r\n?|\n/. Bringing this rant back to the original topic, Zig source code is meant to be easy to understand by tools, because it's all in a consistent format. The more formatting variability that's allowed, the harder it is to write tools to read it. There should only be one way to do newlines in Zig source code, so that tools don't need to worry about that variability.

EDIT: Just for fun, here's some code in Chrome's debugger console that tries to understand JavaScript source code using simple regex. JavaScript is way too complex for that to work, and you can observe lots of misbehavior in that area if you poke at it long enough. This serves as just one counterexample to the "Tooling has become intelligent enough" idea, fwiw.

PavelVozenilek commented 7 years ago

What is the use case for tools massaging source code? Qt does it because C++ is lacking usable metaprogramming, but it is hated and very clumsy to use within IDE.

If one-true-newline-rules-them-all is really that important feature then I suggest to switch to CR/LF everywhere. Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

hasenj commented 7 years ago

Even if you don't understand that mess, do notice how short it is.

That's not a feature by Zig's standards though :)

andrewrk commented 7 years ago

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler. It's unfortunate that a set of people will have to configure their editors beyond the defaults, but that is necessary for one or the other standard to be selected.

It's not my intention to shut down any discussion, but I would posit the thought in everybody's heads who is involved in this thread: is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle?

thejoshwolfe commented 7 years ago

Pros for CRLF: Notepad support. Visual Studio users can be sloppy. You usually don't need to change your native-Windows editor's configuration from the default.

Pros for LF: Easier to write tools that scan for LF than for tools that scan for CRLF. Easier to write tools that produce '\n' instead of "\r\n". sed -i support (always outputs LF regardless of input style). diff looks cleaner, including git diff (metadata always in LF even if the + and - lines end with CRLF).

This is just a start, but the pattern is that LF is more friendly to programmers, and CRLF is more friendly to windows users who don't know any better. In other words, LF is better for advanced users, and CRLF is better for adoption. As an advanced user, I vote for favoring advanced users.

Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

The number of bad programmers dwarfs the others too, and I'm not sure I want to cater to bad programmers. Sure it's better for adoption, but compromising to increase adoption is not in line with Zig's goals.

Wulfklaue commented 6 years ago

The issue with errors like this is, when a new user like me downloads Zig and starts coding in Visual Studio Code ( Windows )... and get this error, the result is confusing. Spend first 10 minutes trying out other examples, to run into the same issue. Still did not figure it was file issues. My first idea? Must be a bug in Zig...

In simple terms, the error message is inadequate and needs to be much more clear.

tiehuis commented 6 years ago

Made some improvements (in the above pull request) to these error messages that should handle the most obvious cases and hopefully help a user diagnose exactly where the problem is a little better.

Open to any wording changes or extra special cases if they are considered noteworthy. Regardless of the stance on line endings, hopefully this helps.

2017-10-26-193828_527x46_scrot

2017-10-26-193929_294x49_scrot

2017-10-26-194055_302x48_scrot

zesterer commented 6 years ago

Reasons Zig should support tabs in source code:

People with small screens exist (yes, still).
Partially-sighted users that use large font sizes exist, and they tend to reduce tab widths in order to see source code without scrolling their editor.
People that write obnoxiously long if statement chains exist.
It reduces the filesize of source code, sometimes quite significantly.
Some people have editors that are not set up (or cannot be set up) to interpret four spaces as tabs, making editing code annoying.
If you're a soft-tab user, your editor is clearly configured in a way that allows you to ignore the difference between tabs and spaces. Working on hard-tab code is not a problem, since your editor should automatically detect the indentation style and alter its behaviour accordingly.
People use a variety of soft-tab widths. I've seen 0, 1, 2, 3, 4 and 8 spaces all used in different projects. Hard-tabs remove this problem and places control of the appearance of the code in the hands of who should have control: the person reading it. Forcing people to read code in this manner is akin to forcing people to use a specific typeface or font colour when viewing the code.
Some people just prefer to use tabs.

Tabs are useful in cases where text-aligned whitespace is not required. Spaces are useful in cases where text-aligned whitespace is required. There is a strong distinction between them, and they have independent uses. To smother this by pretending that there exists no distinction between them and forcing developers to use one or the other totally circumvents the whole purpose for them being distinct characters in the first place.

Maintaining this position will lose you a lot of potential users, including me. Good luck with growing your reach while you enforce such arbitrary rules.

zesterer commented 6 years ago

I propose the following actions are taken:

Allow tabs in Zig source code.
Add a point to the style guide explaining the difference between soft and hard indentation, and the circumstances in which each is appropriate to use.

zesterer commented 6 years ago

Here is an example of the correct use of tabs vs spaces.

The tabs are used to indent code that does not require alignment. The spaces are used to indent code that does require alignment (in this case, because someLongFunction and some_long_value should be aligned).

screenshot from 2018-02-02 01 07 55

thejoshwolfe commented 6 years ago

The decision to ban hard tabs in zig is founded on two main principles:

There should only be one obvious way to do things.
Hard tabs are harder and more complicated to use than just spaces for everything.

If there's only one obvious way to do things, then we need to get everyone to use either soft tabs always or hard tab indentation always. Letting users use whichever they prefer leads to nightmares, such as: what if a single source file has a mix of hard and soft tabs; should that be an error? what if a source directory has a mix of hard and soft tabs; should that be an error? what if an app uses hard tabs and links against a library with soft tabs; should that be an error? And in the case where the user can use whichever they prefer, you will always have the problem that copypasting code from one location will sometimes need to be reformatted to fit the indentation style in another location (such as from stack overflow into your codebase); if you don't check for that, then you get a mix of hard and soft tabs in a single line.

The solution to these nightmares is that everyone has to use the one true official indentation style. The only question that remains is should the official indentation style be hard or soft tabs.

Hard tab indentation, if done properly, can have some nice features. You've outlined a number of positive points in favor of hard tabs, and they're all somewhat compelling.

People with small screens exist (yes, still).

My general response to the points about fitting code into small screens is that hard tabs only solve part of the problem; they're not a proper solution. Hard tabs help a little, but other factors seem like they would matter much more, like whether you wrap your lines at 80 columns or 120 columns or never. I've even seen people criticize using long descriptive variable names because it causes lines to overflow the width of their screen. Not only are hard tabs not a proper solution, but they work against some proper solutions. Like if you did want to wrap all your lines at 80 columns, that concept is undefined if some of those "columns" are occupied by hard tabs; how many "columns" does each hard tab use? The question defeats the purpose of hard tabs in the first place.

It reduces the filesize of source code, sometimes quite significantly.

Yep. I have to agree. My experimentation showed about 88% (plain) and 96% (compressed) size ratio when switching to hard tabs. Hard tabs result in smaller file sizes in almost all cases.

Some people have editors that are not set up (or cannot be set up) to interpret four spaces as tabs, making editing code annoying.

Not quite sure I understand this. Interpreting spaces as tabs? You mean when you press the Tab key, it inserts spaces instead of a hard tab? I can think of one editor that is lacking this feature, and it's Notepad.exe, but that editor is so incompetent, I shudder to even classify it as a text editor. You can't write Zig in Notepad.exe for numerous reasons. Notepad.exe is not supported; it's a terrible piece of software. I'm aware that this compromises Zig's adoption slightly.

It's often true that editors are configured by default to use hard tabs instead of spaces. This is a reality that every programmer, every programming project, and every programming language needs to be ready to deal with. If the programmer and the project turn a blind eye to the issue, as most programming languages do, then the result is a mix of hard and soft tabs in source files creating indentation chaos. It surprises me that so many programmers shrug off this chaos and just carry on with broken indentation. This scenario is unacceptable to me and my projects, and it's currently unacceptable to the zig programming language.

At some point, someone needs to fix the chaos caused by a misconfigured/unconfigured editor, and Zig is enforcing that that must happen before your code will compile. There are lots of solutions to this problem. You could use this tool, for example.

People use a variety of soft-tab widths.

Yeah, this brings up a violation of "one obvious way to do things" on the soft tabs side. If soft tabs are how you indent your code, how many soft tabs should Zig authors use? The answer is 4. This is not currently enforced by the compiler, but if it were, then it would resolve the "one obvious way to do things" violation.

Hard-tabs ... place control of the appearance of the code in the hands of who should have control: the person reading it. Forcing people to read code in this manner is akin to forcing people to use a specific typeface or font colour when viewing the code.

I have to agree that you could decouple the display of the code and the meaning of the code in this way. Hard tabs give you this feature.

If we require that everyone uses hard tabs, we get all the benefits you've outlined above.

The biggest argument against hard tabs is that they're more complicated than spaces:

With hard tabs, there's a strategy for how to indent vs vertical align, as you've pointed out. With spaces, just make it look the way you want, and it's correct. Spaces are simple.
All source files contain spaces. If your source file also contains hard tabs, then there's multiple kinds of horizontal whitespace. You might want to enable the "show whitespace" option in your editor so you can follow what you're looking at. With no hard tabs, there's only one kind of horizontal whitespace, so any spacing you see, you know what it is. (The slight exception is you still can't see spaces at the ends of lines, but that's out of scope here.)
If there's ever a tab after non-tab characters, then there's this weird algorithm for how wide the tab is. It's not just 4 or 8 or whatever you have configured; it's enough space to reach the next multiple of your configured tab width number of columns measured from the start of the line. Now, if you're using tabs properly, you'll never need to think about this, but still. Tabs are complicated.
There are cases where proper use of tabs vs spaces is hard to determine by machine, and so the compiler could let through improper use of tabs vs spaces. This could result in code that appears to be formatted correctly by the author, but will appear to be formatted incorrectly by another viewer with differently configured tab width. The way you avoid this accident is you, as an author, enable "show whitespace" to be careful, or you write a linting tool, or you have code reviews to catch this, or ... Or you just don't have hard tabs, and this never happens.

Spaces are simple. Tabs are complicated.

The most important thing is that we all do it the same way, and spaces are simpler and easier to use than hard tabs. Tabs have more features, but it's not worth it.

zesterer commented 6 years ago

You raise some interesting issues. To address them:

Letting users use whichever they prefer leads to nightmares, such as: what if a single source file has a mix of hard and soft tabs; should that be an error? what if a source directory has a mix of hard and soft tabs; should that be an error? what if an app uses hard tabs and links against a library with soft tabs; should that be an error?

Why should these things be an error? This seems like pedantry for the sake of pedantry, rather than any hard, technical reasoning behind it. At the very most, the compiler should spit out a warning. It doesn't need to do more than that.

The solution to these nightmares is that everyone has to use the one true official indentation style.

This is not a nightmare. None of these situations are nightmares. Since you insist that people should be using modern editors, you will know that modern editors automatically format pasted code to fit with the existing style of the source file, circumventing this problem.

I find your use of the phrase "one true" troubling. As if things like personal preference can be standardised. As the Unicode committee have discovered, attempting to tame the complex preferences of humanity is impossible. Enforcing an indentation standard will not make everybody use that indentation standard. It will simply turn people away from the language.

Besides, this will never be a significant problem for maintainers. The to-be zigfmt tool would include all the necessary ability to format any amount of code any way the user likes. There is even talk of adding a --fmt flag to the compiler that automatically formats Zig as required by the standard.

With hard tabs, there's a strategy for how to indent vs vertical align, as you've pointed out. With spaces, just make it look the way you want, and it's correct. Spaces are simple.

If people are looking for simplicity that hides the true nature of a problem, they shouldn't be using Zig. As has already been expressed by @andrewrk and others, Zig is a language that makes clear the true nature of any given problem space. Since there is a distinct difference between the use of tabs and spaces in a text file, it only seems sensible to not hide this distinction behind a restrictive, compiler-enforced rule such as this.

With no hard tabs, there's only one kind of horizontal whitespace, so any spacing you see, you know what it is. (The slight exception is you still can't see spaces at the ends of lines, but that's out of scope here.)

This isn't really a problem at all. As I say, there are legitimate and distinct reasons for using both tabs and spaces, so having both is only sensible. If the programmer really cannot stand having both in a text file (I'd question the capabilities of a developer if the distinction between tabs and spaces has them confused), then the zigfmt or the --fmt tag will solve the problem for them quickly and easily.

If there's ever a tab after non-tab characters, then there's this weird algorithm for how wide the tab is.

No there isn't. Hard tabs should not appear after non-tab characters in code. End of, no exceptions. It's really very simple.

Spaces are simple. Tabs are complicated.

Python is simple. Zig is complicated. Not because complexity exists for the sake of itself, but because it is required to correctly and fully articulate the problem space. Similarly, the use of hard tabs for non-text-aligned code is the correct solution, even if it is not the simplest hack.

Instead of all this, I suggest a sensible compromise between both stances

Correct use of tabs and spaces should be noted in the style guidelines
Zig should fail to compile code that exhibits tab characters appearing after non-tab characters on any given line

As I've said before: not allowing tabs will push away many potential users. The kind of people pedantic enough to use tabs and spaces correctly are also the kind of people this community should be trying to welcome. Providing them with a compilation error when they try to use their preferred - and completely justified - style is obnoxious and sends a bad message.

Without any intention of being rude, I personally cannot stand not making use of tabs. Therefore, I will be continuing to maintain a tab-permitting fork of the project for my own uses.

andrewrk commented 6 years ago

I'm putting this issue back on the table.

andrewrk commented 6 years ago

@zesterer Considering the case for usage of tabs, as you have specified, how does one type the following 2 lines?

    const tmp_path = try allocator.alloc(u8, dest_path.len +
                                             base64.Base64Encoder.calcSize(rand_buf.len));

For the 2nd line, one could press tab to get a \t to the correct indentation level. Next, we want 40 spaces before the word base64.

With spaces, here's how you'd accomplish this, starting from the cursor at the end of line 1, with any text editor which is replacing the tab key with 4 spaces:

hit tab 10 times

With tabs,

press space bar 40 times

Maybe you prefer, as I do, to indent like this instead:

    const tmp_path = try allocator.alloc(u8, dest_path.len +
        base64.Base64Encoder.calcSize(rand_buf.len));

Assuming your text editor does not understand the syntax of your language, with spaces, you have to press tab once, with tabs you have to press space 4 times.

Am I missing something? Or do you have to press a lot of spacebar for lines that wrap?

Also, how do you know how long your lines are? Can teams who use tabs agree on a maximum line width? Or do they have to give that up?

zesterer commented 6 years ago

Considering the case for usage of tabs, as you have specified, how does one type the following 2 lines?

Personally, I'd format that code like the following:

const tmp_path = try allocator.alloc(
    u8,
    dest_path.len +
        base64.Base64Encoder.calcSize(rand_buf.len)
);

Advantages of this include:

It makes clear that alloc has two arguments (as shown by the number of items that are singularly indented) meaning you don't accidentally miss the tiny u8 when refactoring
It makes clear that base64.Base64Enc... is a component of the dest_path.len argument rather than a distinct argument in its own right, since it has an additional indentation
An absurd amount of spacebar/tab pressing is not required to smartly and understandably organise the code
It's clear that this is just one function call, since the trailing ); is immediately visible to anybody reading the code

Personally, I'm not afraid to use additional lines if doing so aids readability, as is the case in the example above. The primary reason we use syntax instead of writing machine code in hex is because it aids readability, so I don't see the use of additional lines as a problem.

Also, how do you know how long your lines are? Can teams who use tabs agree on a maximum line width? Or do they have to give that up?

Line length is still determined by the line character (column) count. This is for 3 reasons:

Indentation isn't really a part of the code: it's just a way of indicating which parts are distinct from others. Tab indentation width can be altered at will anyway, so counting it as if it were fixed screen real estate isn't useful
Indentation is local: a line of code is likely to have a similar indentation to the line before it, particularly if the lines are relevant to one another. For this reason, scrolling the text editor if one has a small screen is less of an issue, since you're looking at large blocks of code rather than single lines
Column counts are simple to measure. Everything/everyone understands it: it's just the number of characters in a line.

zesterer commented 6 years ago

For examples of me using the aforementioned style in C++, see this and this. I'm sure you will agree: it's quite elegant and makes the code much easier to interpret.

andrewrk commented 6 years ago

Personally, I'd format that code like the following:

I agree with all of those statements about how to format code. It looks like your answer to my question is: use formatting conventions that allow you to indent with a full indentation amount instead of aligning on a particular character.

Does your suggested formatting convention never have any spaces in between a tab and the start of the code? E.g. is there ever a case for a tab followed by a space?

Indentation isn't really a part of the code: it's just a way of indicating which parts are distinct from others. Tab indentation width can be altered at will anyway, so counting it as if it were fixed screen real estate isn't useful

I don't think this is fair to say. There objectively exists the problem of code wrapping on one person's screen, and not wrapping on another person's screen. If one person prefers 8 space tabs and another prefers 1 space tabs, no matter the screen widths they each have, there will be a conflict where code looks fine on one screen and wraps on the other.

It will then become a project-specific argument about how many spaces tabs should be, which defeats the benefit of tabs allowing each user to choose the width.

Counting a tab as a width of 1 column has one positive effect, which is that if every user's editor at least (max_indentation_count * preferred_tab_size) characters wide, then code will not wrap for them. This provides a range of minimum screen widths whose upper bound is inversely proportional to their preferred tab size.

But this benefit is soured by the fact that it makes a formatting convention difficult to specify.

Zig has the opportunity to prevent a huge amount of bickering about this issue, and let people focus on their code.

Minimize energy spent on coding style.

The question is, what is the best way to accomplish this?

I'm not convinced that allowing hard tabs does.

zesterer commented 6 years ago

Minimize energy spent on coding style.

I'm not at all convinced that this is a desirable objective. Style is important. It makes code easier to write, easier to read and easier to maintain. It's not an annoyance that should be overlooked.

E.g. is there ever a case for a tab followed by a space?

Additional spaces are useful when aligning parts of function declarations and comments such as this. Other than that, my personal style doesn't tend to align text in a manner that would require spaces after tabs. Of course, this changes depending on who you ask - particularly in commercial settings - meaning that disallowing non-standard style at a compiler level will likely limit the industry uptake of Zig.

Zig has the opportunity to prevent a huge amount of bickering about this issue, and let people focus on their code.

I really don't think the problem is as bad as you say. The people that bicker are also the people that are the most set in their ways and are least likely to change no matter how hard you try by enforcing style through the compiler. It's not a thing I find to be particularly contentious amongst developers unless they're forced to not use their own preferred style.

As previously stated, indentation style is trivial to change with tools that are often built in to editors by default. It doesn't alter the semantic meaning of the code, and besides: style isn't something that should be rigidly enforced. Often, good style doesn't fit a set of rules and the best style is the one that makes a specific piece of code readable under specific circumstances.

Zen guidelines are a far better fit to this problem than compiler errors.

andrewrk commented 6 years ago

Alright. I've heard the arguments, I've considered carefully, and I've reevaluated my position. And it is still that hard tabs are not allowed in valid zig source. Even if the compiler allowed hard tabs, there would still be the question of zig fmt, which is going to be opt-in, but have no configuration options. (Note that zig fmt will in fact allow hard tabs, windows line endings, and a few more mistakes that have unambiguous corrections).

Hard tabs allow users to abstract the concept of indentation at the syntax level. Zig does not believe in the benefit of this abstraction being valuable enough to justify the headaches that mixing hard tabs and spaces causes developers trying to work together across different platforms.

It is a premise that Zig, and tools that parse Zig code, know the column index and actual display column of every byte of source code. This premise will not be broken by hard tabs. It is a premise that Zig source code that one looks at on their own screen will match what another person sees on their screen, excepting for fancy IDE features that are out of scope of this issue. The Zig project recognizes that these premises have the downsides mentioned above in this issue, namely that text editors without syntax awareness cannot modify indentation widths to a particular user's preference, that some text editors make it difficult to avoid hard tabs, and that file sizes of Zig sources may be a few percent larger.

Exactly 1 ascii control code is recognized, and that is the newline character.

Minimize energy spent on coding style.

This is in the Zig Zen and it is here to stay. Zig wants programmers to focus on the semantics of their code and tolerate differences in style as much as possible.

You will notice that I set an example for this, in that I never comment on style or naming in a pull request. At most I will merge it in a branch, make the edits that I prefer myself, and then merge the pull request into master.

In order to facilitate this Zen, I hereby am redirecting discussion of hard tabs in the official IRC channel, Reddit, and email list. All discussion about hard tabs is to take place here, on this issue. I will not lock the issue just as I will not lock my mind. If someone convinces me to change things, we will change things. So far I am not convinced. Nobody gets in trouble if they violate this new rule; I will simply politely request that they redirect their comments to this issue, so that users can focus on what's more important: writing robust, optimal, clear code.

milkowski commented 6 years ago

I wish Zig all the best but this decision will result in constatnt flame wars in the future, because despite all the merit it breaks one fundamental rule above all: KISS. If proper copy-pasting "Hello World" may be dependent on the system/editor you use it is the fundamental KISS flaw. The in-home formatting tool is the way to go (not only for Go) that resolves and balances well all the inconveniences of code formatting diversity.

As a side note: enforcing code style in the compiler is the last thing that may convince to use any hardcore C programmer considering switch, they are even proud of IOCCC https://www.ioccc.org/years.html. For C++ demanding community there is already Rust option, that Zig seems to be inspired by and if you check most of already well established cummunity projects, the code formating convention is fairly consintent between them without any enforcement.

zovt commented 6 years ago

diff --git a/src/tokenizer.cpp b/src/tokenizer.cpp
index badbd695..e829226b 100644
--- a/src/tokenizer.cpp
+++ b/src/tokenizer.cpp
@@ -17,6 +17,7 @@

 #define WHITESPACE \
          ' ': \
+    case '\t': \
     case '\n'

 #define DIGIT_NON_ZERO \

Applying this patch on master (commit b11c5d8f825) seems to enable compiling the hello world example replacing the indenting spaces with tabs. Not sure if there are any other side effects, YMMV.

Fork is here: https://github.com/zovt/zig

zovt commented 6 years ago

Some thoughts on hard tabs and tooling:

If a tool needs to align horizontally in a monospace environment, replace any non-whitespace character with a single space, and leave other whitespace alone. (See my fork for an example with the compiler)
Alternatively, we can say that tools should replace tabs with N spaces for display or alignment purposes

Is there a use case that you can think of where the above would be insufficient?

Thoughts on style and hard tabs in general:

Making the use of certain whitespace characters a compiler error seems weird
Using zig fmt as the canonical zig style seems better, much like gofmt
Each programming project should have some style guide that prevents bickering about style, but that shouldn't be enforced at the language level

@thejoshwolfe @andrewrk thoughts on the above? My fork seems to work properly with tabs in the stage1 compiler, and I'll be updating the self-hosting compiler to use similar logic. Happy to submit a PR if you change your mind about this issue.

thejoshwolfe commented 6 years ago

Is there a use case that you can think of where the above would be insufficient?

multi-byte utf8 sequences make it nontrivial to measure horizontal space in a text file. there are also multi-codepoint graphemes and wide characters in Unicode.

Using zig fmt as the canonical zig style seems better, much like gofmt

zig fmt works today. The std library is formatted with it. It's self hosted (not in stage 1). If it doesn't support fixing hard tabs yet, it will.

zovt commented 6 years ago

multi-byte utf8 sequences make it nontrivial to measure horizontal space in a text file. there are also multi-codepoint graphemes and wide characters in Unicode.

how does using strictly spaces make dealing with this any easier?

thejoshwolfe commented 6 years ago

Oh sorry. I understand what you're saying now. Your proposals would be sufficient.

Making the use of certain whitespace characters a compiler error seems weird

Java classifies form feed as whitespace. In Go, a form feed is a compiler error.

Zig is even more restrictive than those languages on which ascii control codes are allowed.

zovt commented 6 years ago

Sure, but why does it need to be more restrictive? There seems to be very little appeal here other than the personal preference of the language devs

Meai1 commented 6 years ago

it's not a big deal but it's just weird. If you want to go this way then the compiler should have an option to automatically rewrite source files that contain tabs to spaces so that people can still use ziglang without having to switch their editing software. That's weird too (possibly dangerous) but at least it's convenient.

andrewrk commented 6 years ago

If you want to go this way then the compiler should have an option to automatically rewrite source files that contain tabs to spaces so that people can still use ziglang without having to switch their editing software.

That's coming soon. We have zig fmt in the self-hosted compiler, and it's planned to support hard tabs.

daurnimator commented 6 years ago

Column counts are simple to measure. Everything/everyone understands it: it's just the number of characters in a line.

This is actual quite hard to define once you allow non-latin characters. The width of unicode characters is quite hard to calculate; the 'standard' solution of wc_width is inconsistent, and often incorrect (due to characters with variable widths; and the most common implementation hard-coding an outdated version of unicode)!

I don't think column counts can be consistent in a unicode world, and should not count as a reason against tab-based indentation.

ghost commented 5 years ago

Regarding zig fmt and "only one obvious way to do things":

Anyone who has written 4x4 matrices for 3D graphics (e.g. here) aligns the columns visually. In the case of an identity matrix...

const myMatrix = {1.0, 0.0, 0.0, 0.0,
                  0.0, 1.0, 0.0, 0.0,
                  0.0, 0.0, 1.0, 0.0,
                  0.0, 0.0, 0.0, 1.0};

Or in a transformed matrix...

const transformed = {1.25,   1.0,  1.0,   0.0,
                     0.095,  1.0,  0.0,   1.0,
                     1.05,   3.5,  10.24, 8.5,
                     105.25, 35.2, 92.1,  1.0};

If there were "only one obvious way to do things," I would be forced to live with exactly four spaces at the beginning of each line, and a single space between array elements, thus giving the following result...

const transformed = {1.25, 1.0, 1.0, 10.0,
    0.095, 1.0, 0.0, 1.0,
    1.05, 3.5, 10.24, 8.5,
    105.25, 35.2, 92.1, 1.0};

... or worse, no newlines between elements.

const transformed = {1.25, 1.0, 1.0, 10.0, 0.095, 1.0, 0.0, 1.0, 1.05, 3.5, 10.24, 8.5, 105.25, 35.2, 92.1, 1.0};

"Only one obvious way to do things" is a noble and worthwhile goal from a functional standpoint. The enforcement of indentation tabs/spaces equivalent to telling a painter how to hold his paintbrush. A system that forces its formatting and indentation rules upon me make the above scenario frustratingly difficult, and when I find out that an automated system has removed my tedious alignment I will get red in the face and toss the laptop off the table. Yes, tediously aligning those matrix elements in the above example does suck: I did not enjoy it, but it's "the right way" for the tools I have available to me. A better method of text input, a better human interface than a keyboard, and a better method of rendering text that automatically breaks my matrix into a 4x4 grid without requiring my tedious spacing are all steps in the right direction, but all well beyond the scope of "programming language." In a world where I want to nicely space out my matrices and use zig fmt, can someone explain how the two could coexist?

In Andrew's YouTube talk about Zig he asked for feedback about what is and isn't working for game developers: this is a big one for me.

thejoshwolfe commented 5 years ago

These are the output of the current zig fmt:

const myMatrix = []f32{
    1.0, 0.0, 0.0, 0.0,
    0.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0,
    0.0, 0.0, 0.0, 1.0,
};

const transformed = []f32{
    1.25, 1.0, 1.0, 0.0,
    0.095, 1.0, 0.0, 1.0,
    1.05, 3.5, 10.24, 8.5,
    105.25, 35.2, 92.1, 1.0,
};

And if you really wanted column alignment, I guess you could pad with zeros instead of spaces, but I don't think this looks very good:

const transformed2 = []f32{
    001.250, 01.0, 01.00, 0.0,
    000.095, 01.0, 00.00, 1.0,
    001.050, 03.5, 10.24, 8.5,
    105.250, 35.2, 92.10, 1.0,
};

One of the advantages of not doing any column alignment, like go fmt does, is that when you add or remove items from a list, the line-based diff (like git diff and most other version control) will show only the lines that changed, instead of showing that the whole table changed due to formatting. This wouldn't really apply to these fixed-size matrixes we're discussing here, but it does apply to lists in general.

ghost commented 5 years ago

The formatted output for the transformed matrix is unacceptable to me, and I agree that the transformed2 does not look good either.

git diff showing more-perfect output is of much lower importance to me than being able to quickly scan and reason about the transform of a particular matrix in my code. The git diff having the nicest possible output is valuable every once in a while when inspecting diffs, but matrices having the nicest possible layout is valuable all the time as I continuously re-read the code I've written.

ghost commented 5 years ago

And if you really wanted column alignment, I guess you could pad with zeros instead of spaces

you can use https://github.com/ziglang/zig/commit/d21a1922eb5d76b9b0d0611eaeb42c91f83234ab


zig fmt: off
foo_array = {...}
zig fmt: on

I even remember there is a thing where fmt does a different thing when you put something at the end of the line or something not sure right now but there are some nobs to turn

andrewrk commented 5 years ago

@winduptoy I think that

const transformed = {1.25,   1.0,  1.0,   0.0,
                     0.095,  1.0,  0.0,   1.0,
                     1.05,   3.5,  10.24, 8.5,
                     105.25, 35.2, 92.1,  1.0};

is a good use case to argue in favor of alignment in zig fmt. Can you open a separate issue? I don't think it's related to The Hard Tabs Issue.

thejoshwolfe commented 5 years ago

Can you open a separate issue?

I'm doing it.

thejoshwolfe commented 5 years ago

I opened a separate issue for this here: https://github.com/ziglang/zig/issues/1793

ziglang / zig

The Hard Tabs Issue #544