ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.43k stars 2.52k forks source link

The Hard Tabs Issue #544

Closed basmith closed 6 years ago

basmith commented 7 years ago

Hi,

(This report is based on the v0.1.1 Win64 binary artifact from the Zig website.)

I noticed that if I create a Zig source file in Windows with a native editor (eg Notepad), the compiler complains about line endings:

$ zig build-exe hello.zig
':\code\zig\first\hello.zig:1:30: error: invalid character: '
const io = @import("std").io;
                             ^

If I manually kill the newlines (resulting in the code being all on one line) it compiles.

I tried using Vim in a Cygwin shell and the file it wrote also compiled without complaint (presumably Unix-style newlines, as Notepad renders that file on one line while Vim looks correct).

ghost commented 5 years ago

@andrewrk My intention was to argue in favor of "don't dictate my indentation at all." Solving the issue of "the one true way to enter computer code" is a little lofty without fundamentally altering the way that we interact with a computer and redesigning the medium of code from monospace text to something different.

You asked the community "is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle?" to which I answer, "Yes! Of course! So please take your hand off my paintbrush and let me get to work. Wouldn't you like to do the same rather than forever having half of the programming population find issue and friction with this decision?"

thejoshwolfe commented 5 years ago

That's a reasonable stance. zig fmt removes lots of freedom from the programmer, and that has its downsides. The tradeoff is that you get consistency between programmers. If everyone is forced to conform to one standard, not everyone will be happy with it, but at least we can try to make it pleasant for as many people as possible. If you have proposals for how everyone's zig code should be formatted, please open issues to argue for them. If you want every zig programmer to be able to format their own code differently, then you're arguing against the purpose of zig fmt (and go fmt).

nyovaya commented 5 years ago

@andrewrk Will tabs be allowed at some point?

Calandiel commented 4 years ago

Any updates on this? It's extremely annoying to have the compiler enforce a coding style on you. Even more so than dealing with Rusts borrow checker.

andrewrk commented 4 years ago

zig fmt fixes whitespace now. I suggest configuring your editor to run that on save. https://github.com/ziglang/zig/wiki/FAQ#why-does-zig-force-me-to-use-spaces-instead-of-tabs

Calandiel commented 4 years ago

I'd have my two cents to add regarding this topic but if zig fmt can now handle it correctly, it should be far less invasive. Thank you for the quick response.

codebrainz commented 4 years ago

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

marler8997 commented 4 years ago

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

Calandiel commented 4 years ago

They do have a decision to not use the language tho x)

251 commented 4 years ago

Spaces-only is a real usability issue. Spaces are highly programmer-unfriendly and only work in some way with fancy editor configurations. Let's try to compare in an objective way:

Tabs

advantages

Spaces

advantages

To be honest, I never saw a convincing argument for spaces. It just makes no sense to not use a key that was designed exactly for that and mimicry tabs with soft tabs and alternatives. "Looks everywhere the same" is exactly what you don't want and what brought us the indentation mess.

I'd highly appreciate it if that decision would be reconsidered.

codebrainz commented 4 years ago

To be honest, I never saw a convincing argument for spaces.

To play devil's advocate (I think tabs should be supported), there are two IMO somewhat convincing arguments against them, and so for spaces:

  1. Inside the lexer/parser/compiler, it's impossible to accurately identify the exact location of a token/node/error. You have to either arbitrarily pick a tab width to use or assume it's one character, and report errors at the offsets using the inaccurate location. For example if there's a syntax error at the front of a line indented with a tab, is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width? The only way to fix it is to add another option to the tool like editors have to say how many characters a tab is considered to be.
  2. Even if you follow the better practice of using tabs for indentation and spaces for alignment, there are certain cases where the alignment can still get messed up, like if you have aligned with spaces trailing comments at the end of some lines with different indentation levels with tabs, it will be wrong unless viewed with the same tab-width setting as it was aligned with. For example this code was aligned with an editor configured for 2-char tab width:
void foo() {      // this function is silly
    if (1)          // as is this condition
        printf("hi"); // but at least it's friendly
}

A middle ground would be to add a warning flag like -Wtabs, either enabled or disabled by default, so that each project could choose their own preference/convention. IMO, that's overkill though and a better approach would be to just put the switch-case for tab (back?) in the lexer and add a FAQ answering "don't use tabs" for the question "why are my diagnostic message locations inaccurate?".

251 commented 4 years ago

is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width

2 of course (as other compilers do and IDEs expect as error location). Even for fancy arrows in the error message it's simple: copy the affected line, cut at error and replace all non-white-space code points with space.

if you have aligned with spaces trailing comments

This only happens when block comments span different indentation levels. This is a code smell and breaks with every refactoring. (You don't want to check if comments are aligned in every location when you rename a variable, right?) If you're creating ASCII art or quines - fine, but don't use it in real-world code. go fmt for instance would break those too.

And again, this happens with spaces too: try to integrate such sections into documents with different indention requirements...

thejoshwolfe commented 4 years ago

[tabs] work in every (even the simplest) editor

nope. textarea inputs in web browsers don't support tabs by default, including the github comment editor where i'm typing this right now. It's actually spaces that are supported in every text editor.

[spaces make] diffs are harder to read

nope. it's tabs that are rendered strangely when you prefix every line with a + or -.

...key that was designed exactly for that...

nope. the tab key and character were originally designed to align the cells of a table, not indent structured code. The original purpose of the tab character was to appear in the middle of a line, which is today considered bad practice (at least before the rise of elastic tabstops).

251 commented 4 years ago

inputs in web browsers

Because it's a browser and not an editor. A proper in-browser code editor supports tabs (and monospace) - see GitHub's online editor. There are many plugins, key combinations etc. to solve this if one really wants to write code in a browser?! I'd consider an editor that is not able to even work with the ASCII character set (minus weird control characters) as broken.

it's tabs that are rendered strangely when you prefix every line with a + or -

I see what you mean, although it's not rendered "strangely" - it stops exactly at the same level. I thought more about small indentations (1-2 spaces) where you can't make out the indentation level. With tabs you can just pipe it through less -x16 and it gets much clearer.

align the cells of a table

That's exactly aligning at fixed indentation levels... (tabs on typewriters were also used to indent paragraphs or lists, not just tables).

andrewrk commented 4 years ago

I think people are not aware of zig's stance on hard tabs. I updated the wiki page to make it more clear:

Why does Zig force me to use spaces instead of tabs?

see also Why does zig fmt have no configuration options?

pixelherodev commented 4 years ago

less key presses simple conversion from tabs to spaces possible if needed

Neither of those is strictly true. First off, grammatically speaking the word less does not fit there, and should be replaced with fewer; secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

works in every (even the simplest) editor

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

larger files

3 bytes per indentation level is not nearly large enough to be a serious concern. You might as well complain that comments require two characters, or that Zig has no multiline comments and therefore a ten-line comment requires twenty characters instead of e.g. four. It's just not a real issue regardless.

usually leads to mixtures of tabs and spaces as tabs are re-configured to soft tabs etc. in broken editors

That's not a disadvantage of spaces; again, that argument could easily be made in either direction. Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

conversion from spaces to tabs much harder and usually requires tooling (think tabs for indentation spaces for alignment)

Firstly, this is only an issue if you care about supporting both spaces and tabs. Secondly, it's not even true. I've literally done it dozens of times in the past.

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

andrewrk commented 4 years ago

grammatically

No grammar policing please. Plenty of people around here have English as a second language and teaching English is off topic. Just try to understand intent.

251 commented 4 years ago

grammatically speaking the word less does not fit there, and should be replaced with fewer

Thanks, fixed.

secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

This misuse led to mixes of tabs/spaces in many code bases.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

That's wrong. You can't go from spaces to tabs without a syntax-aware formatter. Find/replace is simply not capable of distinguishing between indentation and alignment.

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

I wouldn't consider it "working" from a usability perspective when I have to press 24 times space to reach the 3rd indentation level (or try to hit it with auto-repeat).

3 bytes per indentation level is not nearly large enough to be a serious concern

I prefer a tab to be 8 spaces wide on most screens...

That's not a disadvantage of spaces; again, that argument could easily be made in either direction.

No. Press a space/tab - get a space/tab. Everything else is just a misconfiguration to cope with usability issues of spaces and leads to mixed up indentations (see above).

Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

It is a usability and more importantly an accessibility issue (think of the limited space on a braille display or the inflexibility to change indentation width).

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

The compiler doesn't enforce anything. You can indent your code in any way as long as you use spaces. You have to agree on 1/2/3/4/8 spaces for indentation per project. With tabs that's not a problem at all.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

Please list the most important ones.

exoticus commented 4 years ago

@pixelherodev the tabs vs spaces thing isn't for aesthetics it's very important for accessibility, tabs are better for accessibility because some users use huge font sizes to be able to see in that case they need to adjust their tab width, because at larger font sizes it becomes harder to even see the spaces. Just because browsers and some tools don't render tabs correctly doesn't mean to say FU to people with eyesight problems, i think you might regret making that argument some years down the hill when that computer screen finishes it's job :D

Checkout this post which goes into the issue a bit more in detail.

paulstelian97 commented 4 years ago

I wanted to try to experiment with this language, but the fact that it imposes no tabs, no Windows newlines and in general other coding style issues which will cause me to add 2 more days for a medium size project just to fix these things that shouldn't need fixing as they don't affect the reliability and correctness of the compiled code at all (I am fine with Rust's borrow checker), I'm not going for this anymore. (I would make a "fork" of this language without these... "Political" I shall call them even if not related to actual politics, issues).

I probably will work with 4 spaces per soft tab/indent level. It's fine. But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style). The only context where I use a different style is Linux kernel, which has its own coding style, imposed things AND the statement that you may break the rules where it makes sense.

And this is the most important factor in imposing the rules at compile time (with an error; warnings may be fine as long as you can locally override them) -- you will be unable to break the rules where it makes sense to do so.

Sobeston commented 4 years ago

add 2 more days for a medium size project

I find this hard to believe.

But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style).

I use vscode, and it does this. (bottom right - Spaces: 4, UTF-8, LF)

you will be unable to break the rules where it makes sense to do so.

Zig fmt is optional, and can be turned off for top-level declarations with // zig fmt: off. The only hard rules are:

and I'm not sure where you'd want to break these rules. Zig fmt is also fairly lenient.

paulstelian97 commented 4 years ago

On the UTF-8 one I fully agree, it's non-controversial. I fully agree with the premise. Skipping the BOM might be a good feature though, which can be done before giving the characters to the tokenizer (and also, the BOM is a valid UTF-8 character with the value 0xFEFF, which can be conditionally skipped if it's the first one). You can even deny overlong forms of characters (ASCII characters should always be 1 byte), that too makes sense. I won't insist on this.

On Windows newlines, I mostly agree, though again simply skipping the character before the tokenizer (and a stray \r that isn't followed by \n would therefore not be considered a newline) -- so it isn't even part of strings unless escaped in the \r form -- might be an easy solution. Most tools can skip \r on their own as well and, if not, you could run dos2unix on said file anyway. Again, you could run dos2unix on .zig files before compiling or as an added build step so I won't insist either.

On the no tabs one it's a bit more complicated. I'd argue that you should default to no tabs BUT allow support for them in larger projects (not single-file projects) by having some sort of configuration parameter or command line switch to allow tabs (and their width), though only at the beginning of the line (tabs following non-tab characters on the same line can be forbidden just fine). For example build.zig could get by with no tabs at all, and it could have one configuration option that tabs are x spaces wide (which "zig fmt" would also obey). Also preferring 3 spaces per indent level, that's a bit odd (you're the first that I've seen with such a preference, being used to 4 spaces in most projects and 8-wide tabs on the Linux kernel specifically). I'm not sure there are tools that could do this preprocessing either so that we can still fit within our own coding style specifications.

pixelherodev commented 4 years ago

@exoticus Thanks for the link.

Accessibility sways me instantly. Tabs win IMO.

filipencopav commented 4 years ago

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

The difference is that developers just won't bother to use Zig)))
I understand about returns, they're not a visible change, but tabs and 4 spaces:
I use an editor which has the tab character be a vertical box draw line, showing the "body" of the function. I can't do that with 4 spaces. Only solution for me: accommodate to the language and feel pain using 4 tabs or just not use the language since i didn't really learn it.
And the irony was that Zig was supposed to make C painless, by making me suffer from 4 spaces. Ok, i'm sorry for the rudeness, i'm still wildly interested in Zig, but forcing you to use certain kinds of tabs and a specific type of returns? In my opinion, a language like Zig shouln't care about whitespace at all!

andrewrk commented 4 years ago

This thread has nothing useful left to offer. Here's the FAQ entry pasted:

Why does zig fmt use spaces instead of tabs?

Because no human and no contemporary code editor is capable of handling tabs correctly. Humans tend to mix tabs and spaces on accident, and editors don't have a way to "indent with tabs, align with spaces" without pressing the space bar many times, leading programmers to use tabs for alignment as well as indentation.

Tabs would be better than spaces for indentation because they take up fewer bytes. But in practice, what ends up happening is incorrectly mixed tabs and spaces. In order to simplify everything, tabs are not allowed. Spaces are necessary; we can't ban spaces. But tabs are not strictly needed, so the null hypothesis is to not have them.

Maybe someday, we'll switch to tabs for indentation, spaces for alignment and make it a compile error if they are incorrectly mixed. But if we did that today, writing Zig code would be too hard. For now your options are to configure your editor to insert spaces when you press the tab key, or configure your editor run zig fmt on save (recommended).

What will make it into the final language specification? It isn't decided yet and it doesn't really matter. Just run zig fmt on save.