Characters per LOC - Githubissues

mkraft commented 11 years ago

Is anyone finding their Ruby code becomes less readable/concise in order to conform to the 80-character limit?

Does the original intent of the 80-character limit still achieve the goal of best readability or has it become a hinderance to that goal?

Github's code views fit 101 character without horizontal scrolling; should we be upping the characters per LOC from 80 to 100?

fuadsaud commented 11 years ago

Some explanation on the subject: http://batsov.com/articles/2013/06/26/the-elements-of-style-in-ruby-number-1-maximum-line-length/

My opinion: I frequently work with a 13' screen laptop, so 80 columns feels good. I agree sometimes you got to be a little creative about your alignments or breaking one statement into multiple ones, but that's better than horizontal scrolling.

mkraft commented 11 years ago

Of all the arguments for the 80 character limit I like the terseness-of-code the best.

Accommodating-the-lowest-common-denominator of editor widths or development screen sizes could be a slippery slope if, say, others use even smaller screens than your 13'. Do we make the max size 50 characters then?

What is the common element among Ruby developers? For me it's Github. Wouldn't it be nice to be able to evaluate a pull request for ruby-style-guide conformity without pulling the branch locally?

equivalent commented 11 years ago

I personally think this rule is complete waste of time in any IDE even Vim you can set line wrap so you never have to scroll the editor horizontally. I agree that line should not be infinite (more than 250 chars is way too much) but I wouldn't limit people to format their lines to less than 120. I completely ignore this rule

alexandru-calinoiu commented 11 years ago

@equivalent I've tried reading code that ignore this rule with line wrap it's very hard to the eyes.

I think this rules is perfectly valid, indentation is your friend, also having a long line with nested ternary operators does not count as a 1 liner :)

lee-dohm commented 11 years ago

I think the rule is a good one. There should be some limit. I personally set the limit at 132 characters, though I could easily be convinced that something shorter is better. I like the GitHub limit of 100 characters and I think that 79 or 80 characters is too small.

I recently moved a bunch of classes to new namespaces in the RuboCop project. I introduced one additional module with two additional spaces of indentation and it pushed a bunch of lines over the 80 character limit. Splitting the lines of code didn't bug me as much as splitting up the strings that went over the limit:

# Old
foo = 'blah blah blah blah blah blah blah blah'

# New
foo = 'blah blah blah blah blah blah blah ' + 
      'blah'

I just think that lines that legitimately go beyond 80 characters are too common whereas those that legitimately go beyond 100 are much less so.

derek-watson commented 11 years ago

:+1: for 100 character limit.

derek-watson commented 11 years ago

This was also mentioned in #200, the submitter was advocating for 120 characters.

mkraft commented 11 years ago

There are those who might say that one of the great things about Ruby is its progressiveness; that it is constantly changing, adapting, and keeping up with the times. Perhaps that is one of the things that differentiates Rubyist from, say, the Java community. Could increasing the LOC character limit to 100 be another good example of the Ruby community's progressiveness?

fuadsaud commented 11 years ago

@fuadsaud The screen size is my reason to keep my lines under 80 columns. I'm not stating that it should be the guideline. If the majority supports increasing the limit, I think that should be done.

mshappe commented 11 years ago

Especially in an era when most people now use wide screens, I think 80 columns may finally be an anachronism. I don't think lines that are much longer than 100 are a good idea most of the time, but 80 (which for most people has usually really meant 72 or so, I think) is a limitation imposed by a technology that is no longer generally in use!

I mean...seriously, does anybody still confine themselves to the use of an 80x24 green screen?

On Tue, Aug 27, 2013 at 3:21 PM, Fuad Saud notifications@github.com wrote:

@fuadsaud https://github.com/fuadsaud The screen size is my reason to keep my lines under 80 columns. I'm not stating that it should be the guideline. If the majority supports increasing the limit, I think that should be done.

— Reply to this email directly or view it on GitHubhttps://github.com/bbatsov/ruby-style-guide/issues/207#issuecomment-23367430 .

namick commented 11 years ago

I vote for bumping it to 100 especially with the width limit Github has chosen.

80 feels constrained. 120 feels too long.

Longer often means less readable. We have much wider screens these days, but do we want use that extra real estate to look at extra long lines of code?

Many people use splits see multiple files side by side, view running tests, IRC, etc..

100 is a nice round number and exemplifies the progressiveness of Ruby by letting go of antiquated constraints.

marianposaceanu commented 11 years ago

There's a good reason behind the 80 character limit and it doesn't come from coding practices but from typography:

Anything from 45 to 75 characters is widely regarded as a satisfactory length of line for a single-column page set in a serifed text face in a text size. The 66-character line (counting both letters and spaces) is widely regarded as ideal.[0]

This affects readability:

Too long – if a line of text is too long the visitor’s eye will have a hard time focusing on the text. Too short – if a line is too short the eye will have to travel back too often. Other sources suggest that up to 75 characters is acceptable[1]

[0] - http://webtypography.net/Rhythm_and_Proportion/Horizontal_Motion/2.1.2/ [1] - http://baymard.com/blog/line-length-readability

mkraft commented 11 years ago

To that point, are people finding the text in this comment thread hard to focus on? It's about twice the character limit that @dakull cites as ideal - and longer than the 100 character limit that has been proposed. Would lines of code shorter than this thread suffer from readability problems?

lee-dohm commented 11 years ago

@dakull the 80 character limit historically had nothing to do with typography. It was a holdover from early column-oriented languages such as Fortran and COBOL that were designed to work with punchcards that were 80 columns wide.

I haven't known anyone to read a page of code in the manner in which they read a page of a book or a website. A column of prose, regardless of medium, is a fixed width where each line has virtually or identically the same width, creating a flow and a rhythm that is different from reading lines of code. Lines of code are of highly variable widths where each line (typically) is a standalone concept, related to those around them like sentences in a paragraph, but I've never seen in poetry or prose a limit placed on the number of characters allowed in a sentence. Whereas in code, statements being spread over multiple lines is typically strongly discouraged except in very specific circumstances. Yet sentences in prose sliding back and forth over line boundaries is a trifle, if conceived of at all, and in poetry only slightly moreso.

derek-watson commented 11 years ago

I wonder if there's a way to go about this empirically. Why don't we choose a large popular ruby project or two and perform some static analysis on the source, summarizing the number of lines of code approaching or pushing the boundaries we're discussing? We can compare the frequency of long(er) lines against those that are well within bounds to see how often this comes up, and possibly select some examples for style discussion; are there too many expressions on a single line? Can the code be improved by refactoring into shorter lines? Is a single statement legitimately longer than it used to be for some reason?

marianposaceanu commented 11 years ago

I might be biased but from empirical evidence text < 90-100 characters is easier to read, of course whilst having a decent line-height and a good typeface.

lee-dohm commented 11 years ago

@derek-watson I was considering seeing if a tool could be made to mine GitHub for Ruby source code and simply doing a histogram of line lengths. We could then see the percentage of lines that full under arbitrary line lengths and make a decision based on that to begin with.

lee-dohm commented 11 years ago

@dakull You are correct, shorter lines are easier to read. What we are talking about is not "what should the average line length be" but "what should the absolute, no-excuses, thou-shalt-not-pass, under-no-circumstances-go-past-this, maximum line length be". Even the lines approaching the maximum should be the exception, not the rule (as it is in typography).

namick commented 11 years ago

One of the most important things is consistency.

If many people think that 80 is too short, then they will ignore the rule and end up going longer than even 100 or 120. I see this happening everywhere.

If we bump it to 100, it is possible there will be a much wider adoption of the rule and that will lead to more consistent code.

Also, @dakull, your first link is broken without the trailing slash. http://webtypography.net/Rhythm_and_Proportion/Horizontal_Motion/2.1.2/

marianposaceanu commented 11 years ago

@namick thanks, fixed.

I'm searching for a way to analyse a code base in regard to the number of characters per line, so far[0]:

$ gem install cane
$ cane --no-abc --style-measure 80 '{lib,spec}/**/*.rb' > result.txt

It displays the lines that violated the rule --style-measure 80 i.e. no more than 80 chars per line.

As a quick run in /rails/actionpack there are ~3000 violations with 80, ~ 1300 with 100 and ~ 600 with 120 chars.

[0] - https://github.com/square/cane

fuadsaud commented 11 years ago

@dakull take a look at rubocop also (http://github.com/bbatsov/rubocop).

JeffWaltzer commented 11 years ago

The way I see if your line of code exceeds 80-100 characters it probably is violating SRP (Single Responsibility Principle)

On Fri, Aug 30, 2013 at 11:38 AM, nathan amick notifications@github.comwrote:

One of the most important things is consistency.

If many people think that 80 is too short, then they will ignore the rule and end up going longer than even 100 or 120. I see this happening everywhere.

If we bump it to 100, it is possible there will be a much wider adoption of the rule and that will lead to more consistent code.

Also, @dakull https://github.com/dakull, your first link is broken without the trailing slash. http://webtypography.net/Rhythm_and_Proportion/Horizontal_Motion/2.1.2/

— Reply to this email directly or view it on GitHubhttps://github.com/bbatsov/ruby-style-guide/issues/207#issuecomment-23569774 .

http://jeffwaltzer.com/ http://github.com/JeffWaltzer http://github.com/JeffWaltzer http://tastydoc.com/jeffwaltzer http://www.linkedin.com/in/jeffwaltzer

bbatsov commented 11 years ago

@dakull rubocop -f o somedir will also give you a similar summary.

As I've often said in the past - I'm not sure Rails's code is the pinnacle of good Ruby coding style, so other major projects should be analyzed as well.

marianposaceanu commented 11 years ago

@bbatsov I agree yet it's a rather big Ruby code base so I wouldn't dismiss it completely, anyway it was just an example.

mkraft commented 11 years ago

As a quick run in /rails/actionpack there are ~3000 violations with 80, ~ 1300 with 100 and ~ 600 with 120 chars.

How do we interpret these results?

lee-dohm commented 11 years ago

Here are my thoughts ...

Line lengths should approximately follow a normal distribution.
We should exclude lines that have only whitespace characters in them.
We may want to exclude lines that consist only of comments because those can be reflowed trivially.

This is my proposed approach:

Select a threshold for number of lines that we expect to fail to meet the selected standard in the target file set: 1 in 20? 1 in 100? 1 in 1000? (Corresponding to 95th, 99th and 99.9th percentile)
Select a target file set.
Get the line length for each qualifying line. (See above for thoughts on lines that should or should not qualify for inclusion.)
Calculate the mean and standard distribution for the line lengths of qualifying lines.
Calculate X = mean + sd * z
- Where z is equal to the z-score of the threshold value: 1.644854, 2.326348, and 3.090232 for my examples.
X is the number of characters that would meet our criteria for probability of being exceeded for the given file set.
Optional - Plot a histogram of line lengths from the target file set to see if there are any interesting anomalies. One might be that the number of lines having more than 80 characters is significantly less than the number of lines having 80 characters or less showing a large amount of self-policing and support for the 80 character rule. If the line is smooth at that point, then that would indicate that perhaps there isn't much support for the 80 character rule in the Ruby community.

This is why I was suggesting taking something really huge like as much of the Ruby code from GitHub that we could download. It would give us a good idea of how long lines actually are across a variety of code bases. My hypothesis is that even if we select 1 in 1000, given a "representative" file set the line length will end up being less than 132 characters.

lee-dohm commented 11 years ago

I used the following Ruby script:

ARGF.each do |line|
  case line
  when /^\s*$/ then next   # Exclude whitespace-only lines
  when /^\s*#/ then next   # Exclude comment-only lines
  else puts line.length
  end
end

With this command:

$ find . -name "*.rb" | xargs line-lengths > ~/line-lengths.txt

In the RuboCop code base, did some fiddle-faddling with R and came up with the following results:

95th percentile: 67.05879
99th percentile: 80.41816
99.9th percentile: 95.39264

Of course, the RuboCop code base already conforms, in the main, to the 80 character limit so the data is somewhat skewed. But as we can see from the attached histogram, there are lines that exceed 80 and even 100 characters. Also interesting is the huge spike around 15 characters ... I assume this is all the lines that consist only of end and other such things. So the data isn't actually normally distributed as I originally theorized, but perhaps still using the normal distribution is a decent enough starting point. Though it appears there are two classes of lines ... "short" that averages around 15 characters and "long" that averages somewhere around 40 characters.

lee-dohm commented 11 years ago

It would appear that lines consisting only of end are the lions share of that spike. I rewrote the script to exclude lines that only consist of end and whitespace and got this histogram:

We still have a spike around 15 characters, but not nearly as pronounced.

fuadsaud commented 11 years ago

@lee-dohm nice data. On Sep 5, 2013 2:16 AM, "Lee Dohm" notifications@github.com wrote:

I used the following Ruby script:

ARGF.each do |line| case line when /^\s$/ then next # Exclude whitespace-only lines when /^\s#/ then next # Exclude comment-only lines else puts line.length endend

With this command:

$ find . -name "*.rb" | xargs line-lengths > ~/line-lengths.txt

In the RuboCop code base, did some fiddle-faddling with R and came up with the following results:

95th percentile: 67.05879

99th percentile: 80.41816

99.9th percentile: 95.39264

Of course, the RuboCop code base already conforms, in the main, to the 80 character limit so the data is somewhat skewed. But as we can see from the attached histogram, there are lines that exceed 80 and even 100 characters. Also interesting is the huge spike around 15 characters ... I assume this is all the lines that consist only of end and other such things. So the data isn't actually normally distributed as I originally theorized, but perhaps still using the normal distribution is a decent enough starting point. Though it appears there are two classes of lines ... "short" that averages around 15 characters and "long" that averages somewhere around 40 characters.

[image: image]https://f.cloud.github.com/assets/1038121/1085950/ed2df8f4-15e8-11e3-9ffb-b8aaf0d0359c.png

— Reply to this email directly or view it on GitHubhttps://github.com/bbatsov/ruby-style-guide/issues/207#issuecomment-23844294 .

marianposaceanu commented 11 years ago

@lee-dohm very nice indeed. So it seems it abides to the 80 chars rule +/- ~1% or less ?

mkraft commented 11 years ago

Now we have some data from the Rails code base, and we have what looks to me as a Bell curve. Does anyone want to suggest how this data might be used to argue for or against increasing the style guide's characters per LOC from 80 to 100?

If one analyzed my Ruby projects on Github one would find close to 100% of the Ruby code < 80 characters per LOC, however I'm a huge proponent of increasing the limit.

marianposaceanu commented 11 years ago

I think we should take into account the deviations, being a Bell curve is not that relevant as it covers the range of OK values i.e. 1..80. I guess we could take a couple of big projects measure their deviations and make a decision.

mkraft commented 11 years ago

I'm curious to hear from people: is counting the number of "rule breakers" a good way to determine if the rules should be changed?

gnapse commented 11 years ago

I work on a 13'' MacBook too, I use Vim on iTerm2 with Inconsolata font at 14pt, and lines of 100 characters fit comfortably. The 80 characters limit has been around for too long, and the original terminal size limitations that gave its origin, are no longer in place, IMO.

Lines with over 80 in my code, and most code I've read out there, are not that uncommon, and when they occur, they're somewhat hard to overcome without you feeling awkward doing it. I started using RuboCop about 3-4 months ago, and I ended up configuring it to use the 100 limit by default.

Also, I think the statistics shown above drop so sharply towards zero for lines over 80 precisely because of the tendency of programmers to avoid them just for following style guides and conventions in place today. Place the new conventional size at 100, and the graph will reflect that over time.

Therefore, I vote for moving the standards up to 100. Just my 2 cents.

memoht commented 11 years ago

I asked this simple question once: Why do Rails apps have boilerplate comments longer than 80 characters? I also asked Ryan Bates if he followed the <80 line rule to which Ryan replied that he prefers to code for readability rather than just stick to <80. Another coder I asked who works at Red Hat spends most of his day in VIM, SSH'd into other computers. He follows <80 but agreed that <120 would be acceptable.

I try to follow the rule as much as possible but not religiously. I suppose if I worked on a team, some other people would be torked out. I have a 13-in MBP and use Sublime with Ubuntu Mono Regular at 11pt. I have a ruler guide turned on at 80 so that I am aware of when I cross over. I probably would be closer to 100% conformity if the limit was <100.

So +1 for <100 is my final 2 cents.

lee-dohm commented 11 years ago

The statistics above are just an example of how the analysis can be performed. I chose the RuboCop code base because I already had it on my system. The reason why the numbers drop so sharply at 80 characters is because RuboCop has a spec enforcing that no RuboCop rule is broken ... including 80 characters per line, though you can mark certain files or blocks within files to be ignored by certain rules. I still believe that a wider analysis needs to be performed to determine the "natural" line lengths of code in Ruby. Here is a plot from the other extreme ... the Rails code base:

There are lines in Rails that are up to 2,000 characters long! I doubt that these are lines of actual code (at least ... I hope not). I'll probably re-run the analysis excluding lines longer than 200 characters later today after I get back from work. But, as it stands ... the Rails code base consisted of 126,702 qualifying lines of what we assume to be "real code" and RuboCop consisted of 12,863 qualifying lines of "real code" ... approximately an order of magnitude difference.

If we can come up with a list of projects (or a criteria for selecting projects at relative random from GitHub), I volunteer to do the number crunching. If people think it would be useful or interesting, that is.

bbatsov commented 11 years ago

Great! I'm looking forward to see the results.

kalkov commented 11 years ago

100 :+1:

bbatsov commented 11 years ago

@lee-dohm How are those results coming along?

lee-dohm commented 11 years ago

@bbatsov Working on them as I can. You can see the project at https://github.com/lee-dohm/line-length-miner

On Sep 12, 2013, at 9:16 AM, Bozhidar Batsov notifications@github.com wrote:

@lee-dohm How are those results coming along?

— Reply to this email directly or view it on GitHub.

fuadsaud commented 11 years ago

From Github Data Challenge: http://sideeffect.kr/popularconvention#ruby

bbatsov commented 11 years ago

@fuadsaud Nice data, but the section "Whitespace around operators, colons, { and }, after commas, semicolons" seems way off.

lee-dohm commented 11 years ago

@bbatsov I think the issue is that the data is based on commits. This skews the data towards things that are very active, whereas I suspect that "best practices" are more likely to be exemplified by mature projects that are perhaps not checked into as often. Although, research indicates that open source projects drop in quality as they get larger.

meagar commented 11 years ago

@bbatsov @lee-dohm Because it is based on all commits, instead of just the last (hopefully) best and most "correct" state of a each file, the results could potentially be very far off.

Very simplistically: Suppose we had a single commit, with a single line. If somebody commits the line a+b, and then it is later "fixed" to a + b, that's a clear case for using spaces around boolean operators, and we want the stats to say "100% of people use a + b". However, because both commits are counted, and it suddenly appears that only 50% of people use spaces around boolean operators.

The numbers essentially assign equal value to good data and all the bad data that has ever been committed, only to be later fixed.

lee-dohm commented 11 years ago

So, I've got a working version of my script to pull data from GitHub. It looks at the top n most popular Ruby projects on GitHub, where n == 10 in this iteration. You can see the code and the full explanation of the methodology on the project page. Here are the results:

Count: 257,344 lines examined
μ: 44.73162
σ: 42.37547
95^th percentile: ~90 characters
99^th percentile: ~123 characters
99.9^th percentile: ~176 characters
Percentile of 80 characters: 91.82262% (8.17738% of lines would need to be rewritten)
Percentile of 100 characters: 96.88238% (3.11762% of lines would need to be rewritten)
Percentile of 120 characters: 98.81598%
Percentile of 132 characters: 99.31609%

And here's the distribution:

So, the data isn't exactly normally distributed, but we should still be able to draw some conclusions from it. Here's what I see:

The vast majority of code stays within 132 characters
If we assume that these ten projects are examples of "acceptable code", 80 characters would seem to be too restrictive

Now, as @mkraft pointed out, I don't really want to take the point of view of coddling "rule breakers". But I do see the code from these ten projects as representative of what good coders naturally write to get their work done. From this, we can then extrapolate a line length (for Ruby code) that has enough room to be expressive in the vast majority of cases but restricted enough to enable the comfort of the majority of users.

Given the numbers, my opinion is that even 100 characters might be too restrictive. But it would be a markedly better alternative than 80.

jfelchner commented 10 years ago

I've always coded to 80 characters, but have recently decided that 100 characters is going to be my new norm regardless of the decision here. This allows for a vertical vim split and still be able to see all code in both panes without horizontal scrolling. It forces me to be concise but allows more flexibility when I have a line With::An::Unusually.long_method_name.

I think that the 80 character limit is a relic of the past. A lot of people can make reasonings as to why 80 characters is "good" and "perfect", but the fact is, if we'd had 100 column punch cards, we'd either not be having this discussion, or we'd be talking about how 100 columns is "too restrictive". We certainly wouldn't be thinking "I know, we should use 80 columns!"

mshappe commented 10 years ago

Like a highway speed limit, any number established here is going to be too low for some people, too high for others, and widely ignored by people who believe they have a strong enough reason to do so :smile:

I find myself agreeing with @JeffWaltzer uptopic that most of the time, a line that's very long is probably trying to do too much. I don't think that's always true, but I think it's certainly a question to ask oneself when faced with a really long line. Certainly, in any code I was responsible for helping to maintain, if I encountered a line significantly longer than 100-120 characters, and it wasn't mostly made up of quoted text, I would give serious thought to at least breaking up the line (which is easy enough to do) if not refactoring the statement.

halilim commented 10 years ago

There are other reasons for shorter lines, apart from history and screen sizes:

Viewing code side by side (diffs, merges etc)
Viewing code in other media (Github, email, etc)
Catching refactoring opportunities
Code readability (human eye can't scan long horizontal lines fast)

See here for more ideas.

100 seems like an interesting option, probably needs a little bit more research & insight.

@lee-dohm I wonder what we'd find if the lines were limited to longer than ~70 characters (i.e. kind of taking only the long lines into account).

jameswald commented 10 years ago

A soft limit of 100 characters has worked best for my projects which typically include large chunks of Java, a language not known for its terseness. Time spent rearranging long lines to conform to this limit often feels like a productive investment because the end result is typically more readable than it was before. I can't say the same about my effort to fit code and marked up documents within 80 characters.

With a limit of 80 characters, I often find myself reformatting what I would consider to be perfectly reasonable lines of code. The end result being no more readable than it was before (although this statement is largely subjective). In some cases this effort may even lead to less readable code because the reader will be required to scan vertically more often than necessary.

I would argue that reformatting code has a very real cost that we should strive to minimize. Consider Go's approach, despite having a de facto style enforced by go fmt, the line length is not enforced. In fact, a concrete limit is not even recommended. Instead Effective Go offers the following advice:

Go has no line length limit. Don't worry about overflowing a punched card. If a line feels too long, wrap it and indent with an extra tab.

halo commented 10 years ago

This being the only rule of the style guide I consequently want to break, I agree with the pro-100 arguments provided by @namick https://github.com/bbatsov/ruby-style-guide/issues/207#issuecomment-23506461 https://github.com/bbatsov/ruby-style-guide/issues/207#issuecomment-23569774

I feel cramped in by 80 characters and I have a 11" screen. 100 seems like a good compromise to me.

rubocop / ruby-style-guide

Characters per LOC #207