h-Klok / StatsWithJuliaBook

https://statisticswithjulia.org/
MIT License
1.08k stars 279 forks source link

Trailing whitespaces at EOL, lack of newline at EOF and inconsistent indentation #8

Closed VincentTam closed 5 years ago

VincentTam commented 5 years ago

Thanks for great code samples, and sorry for another issue. (I'll post one more later.)

Your repo is contaminated with carriage return ^M. Please check your core.autocrlf settings so that git diff would produce consistent behavior across OS. GitHub's guide for line endings would be helpful.

Here's a list of files contaminated with ^M:

$ git grep -l "^M"
1_chapter/imageProcessing.jl
3_chapter/basicDistRand.jl
3_chapter/gammafunctionIntegration.jl
3_chapter/inverseCDF.jl
3_chapter/mvParams.jl
3_chapter/normalCalculus.jl
3_chapter/weibullHazard.jl
4_chapter/dataframeReferencing.jl
5_chapter/centralLimitTheorem.jl
5_chapter/randomizationTest.jl
data/IQalc.csv
data/L1L2data.csv
data/examData.csv
data/fertilizer.csv
data/grades.csv
data/jsonCode.json
data/machine3.csv
data/stars.png
data/temperatures.csv
data/weightHeight.csv

To verify my claim, you may pick any file listed above and use curl to display the remote file to STDOUT, then pipe it to od -c for display with printable characters or backslash escapes.

$ curl -L https://raw.githubusercontent.com/h-Klok/StatsWithJuliaBook/master/data/fertilizer.csv | od -c
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   128  100   128    0     0    184      0 --:--:-- --:--:-- --:--:--   184
0000000   C   o   n   t   r   o   l   ,   F   e   r   t   i   l   i   z
0000020   e   r   X  \r  \n   4   .   1   7   ,   6   .   3   1  \r  \n
0000040   5   .   5   8   ,   5   .   1   2  \r  \n   5   .   1   8   ,
0000060   5   .   5   4  \r  \n   6   .   1   1   ,   5   .   5  \r  \n
0000100   4   .   5   ,   5   .   3   7  \r  \n   4   .   6   1   ,   5
0000120   .   2   9  \r  \n   5   .   1   7   ,   4   .   9   2  \r  \n
0000140   4   .   5   3   ,   6   .   1   5  \r  \n   5   .   3   3   ,
0000160   5   .   8  \r  \n   5   .   1   4   ,   5   .   2   6  \r  \n
0000200

The \r \n in od -c's output represents carriage returns ^M that you've committed on Windows. That would creat a trailing ^M in git diff's output on GNU/Linux.

In addition, some text files don't have trailing newline \n at the end of file (EOF). That would cause inconsistent results with some command line utilities such as wc on GNU/LInux. You may refer to Why should text files end with a newline? on Stack Overflow to know more.

VincentTam commented 5 years ago

The list of files without \n at EOF is so long that I don't wish to publish. It can be found using

$ git ls-files| \
while IFS= read -r file; do
  test `tail -c1 $file` && echo $file
done

Except the last one data/stars.png, the rest are all Julia programs.

VincentTam commented 5 years ago

This project's indentation character is not unified. Many lines start with whitespaces, and some others begin with tabs \t. This gives inconsistent results across editors. I've found the suggestion of sticking to one single choice from Jeff Atwood, the co-founder of Stack Exchange, sensible. Personally, I would go for spaces because \t can be expanded differently across different environments.

Explanation of grep's options used:

In addition, the trailing white spaces at end of line (EOL) are often invisible in shells. Some smart editor (like Atom) automatically trim away trailing white spaces at EOL, causing git diff to display invisible pairs of lines. The following example illustrates this

echo Sample line > file1.txt            # create a one-line text file
sed "1s/$/  /" < file1.txt > file2.txt  # read file1.txt, append 2 white spaces, save to file2.txt
diff -u file[12].txt                    # compare these two files

The output of diff is a bit difficult to understand at the first glance.

--- file1.txt   2019-05-11 20:18:21.040257225 +0200
+++ file2.txt   2019-05-11 20:18:21.044257249 +0200
@@ -1 +1 @@
-Sample line
+Sample line  

Here's a list of files with trailing white spaces:

git grep -IEl " +$"          # run this command to see that output
git grep -IEl " +$" | wc -l  # returns the number of files concerned (75)

You may see this SE question about trailing spaces to know more.

h-Klok commented 5 years ago

Thank you very much @VincentTam for all your effort in finding these issues and for your corrections/pull request! I guess these carriage returns and EOF inconsistencies etc arose due to work being done over linux, windows and macOS systems. From now on I'll try and work purely in linux and try to be more consistent regarding these things. Once again thanks for the great contribution!

VincentTam commented 5 years ago

I guess these carriage returns and EOF inconsistencies etc arose due to work being done over linux, windows and macOS systems. From now on I'll try and work purely in linux and try to be more consistent regarding these things.

That's a nice and simple solution :+1. As the issue of line endings might also affect, in general, other Git repo, I suggest that you run git config -l to list your Git settings. If you find the option core.autocrlf missing, then please add follow the linked guide from GitHub and set the option to true.

$ git config --global core.autocrlf true
# Configure Git on Windows to properly handle line endings

Regarding the line ending, when working online on GitHub/GitLab.com, you may leave an empty trailing line to preview the "missing \n at EOF" from happening.

Nonetheless, sometimes, it might be possible that one has to switch to other OS due to technical constraints. It's worth pointing out that a simple GNU utility od -c can spot this out. As you can see from my previous comments, od -c would output \r for instances of carriage return ^R, and it would print \n if the file has newline at EOF. A while loop through the Git index might be helpful.

git ls-files| \
while IFS= read -r file; do
  # your cmd here
done