humdrum-tools / humdrum

The Humdrum Toolkit: Analysis tools for music research
http://www.humdrum.org
7 stars 4 forks source link

Make file produces *.awk files with Windows style line endings #4

Closed mshafer1 closed 7 years ago

mshafer1 commented 7 years ago

This renders a fresh install of humdrum unusable.

workaround: Using a Linux, shell (so, the same one your planning to use for humdrum) in the humdrum/bin directory run the attached changeLineEndings.py file - it will change all Windows style line endings to the current system's (Linux) style in all neighboring files and directories.

\<file removed>

craigsapp commented 7 years ago

The correct method to submit changes to this repository is to fork the repository, and then submit a pull request with the updates.

Also, your program recursively changes every file in the directory in which it is run. If doing that to a binary file, the file could be messed up, such as the three compiled programs in humdrum/bin. If the program is run in the humdrum directory, then all files in the repository, including humdrum/.git directory, would be altered, and the .git directory contains binary files.

However I cannot find any evidence of MS-DOS/Windows style line endings in the AWK programs. How (or even where) did you download the repository, and what steps did you do to install? And what version of linux (was it the new Ubuntu shell for Windows 10, for example)?

I downloaded in both MacOS and Fedora Linux with this method:

git clone https://github.com/humdrum-tools/humdrum
cd humdrum
make bin

And checking the state of the newlines in humdrum/bin/*.awk, I don't find any MS-DOS/Windows newlines:

$ cat bin/*.awk | od -x | grep 0d | wc -l
0

I.e., there are no \r characters in any of the bin/*.awk files.

Checking another way:

bin/barks.awk: UNIX
bin/cbr.awk: UNIX
bin/census.awk: UNIX
bin/cents.awk: UNIX
bin/cleave.awk: UNIX
bin/cocho.awk: UNIX
bin/context.awk: UNIX
bin/correl.awk: UNIX
bin/deg.awk: UNIX
bin/degree.awk: UNIX
bin/diss.awk: UNIX
bin/ditto.awk: UNIX
bin/dur.awk: UNIX
bin/ekern1.awk: UNIX
bin/ekern2.awk: UNIX
bin/ekern3.awk: UNIX
bin/extract1.awk: UNIX
bin/extract2.awk: UNIX
bin/extract3.awk: UNIX
bin/extract4.awk: UNIX
bin/fields.awk: UNIX
bin/findpair.awk: UNIX
bin/find_reg.awk: UNIX
bin/freq.awk: UNIX
bin/hint.awk: UNIX
bin/humsed.awk: UNIX
bin/infot.awk: UNIX
bin/kern.awk: UNIX
bin/kernordo.awk: UNIX
bin/key.awk: UNIX
bin/melac.awk: UNIX
bin/metpos.awk: UNIX
bin/minors.awk: UNIX
bin/mint.awk: UNIX
bin/ms1a.awk: UNIX
bin/ms1b.awk: UNIX
bin/ms1b-old.awk: UNIX
bin/ms1c.awk: UNIX
bin/ms.awk: UNIX
bin/msjoin.awk: UNIX
bin/num.awk: UNIX
bin/number.awk: UNIX
bin/patt.awk: UNIX
bin/pattern.awk: UNIX
bin/pc.awk: UNIX
bin/pcset.awk: UNIX
bin/pitch.awk: UNIX
bin/proof1.awk: UNIX
bin/proof2.awk: UNIX
bin/proof3.awk: UNIX
bin/recode.awk: UNIX
bin/regexp.awk: UNIX
bin/reihe.awk: UNIX
bin/rend.awk: UNIX
bin/scramble.awk: UNIX
bin/semits.awk: UNIX
bin/solfa.awk: UNIX
bin/solfg.awk: UNIX
bin/specc.awk: UNIX
bin/spectra.awk: UNIX
bin/stats.awk: UNIX
bin/strophe.awk: UNIX
bin/synco.awk: UNIX
bin/thru.awk: UNIX
bin/timebase.awk: UNIX
bin/tonh.awk: UNIX
bin/trans1.awk: UNIX
bin/trans2.awk: UNIX
bin/urrhythm.awk: UNIX
bin/vox.awk: UNIX
bin/xdelta.awk: UNIX
bin/x_option.awk: UNIX
bin/yank1.awk: UNIX
bin/yank2.awk: UNIX
bin/yank3.awk: UNIX
bin/yank4.awk: UNIX
bin/yank5.awk: UNIX
bin/ydelta.awk: UNIX
mshafer1 commented 7 years ago

Because I am not proficient with makefiles, and was uncertain which files would be causing the line ending issue, I thought raising the issue might encourage those who are more familiar with this repo to dig into it. I do not propose that the Python script I posted was a permanent fix.

For the troublesome install, started with the directions from here: https://github.com/shanahdt/mus7921/wiki/Installing-Humdrum-on-Windows

Using Cygwin on a Windows 10 machine.

Eventually used the following commands to do the download and make: mkdir /humdrum-install cd /humdrum-install git clone --recursive https://github.com/humdrum-tools/humdrum-tools/ cd humdrum-tools make update make install

The non-default location (different from what is presented in said instructions) was an attempt to see if the space in the computer owner's username (and therefore in their default Cygwin username), was the cause of some of the issues in getting the scripts to run; however, the only issue this resolved is not needing to manually added escapes for spaces in the path after running make install.

humdrum scripts consistently threw errors about unexpected character '\r'. When I realized that it was complaining of a Windows style line ending in the bash scripts, I made the connected Python script to quickly change all the bin files (if it failed I could always delete them and re-run make). This appears to have fixed the issue, and the owner is now able to run several of the bash scripts (ran "extract -f 1 [desiredfile.krn] | context -b '{' -o '[=r]' | grep -v '^.' | semits -tx" to verify this).

I would be curious what you would get if you ran cat bin/* | od -x | grep 0d | wc -l as it was not the .awk files, but the accompanying bash scripts (i.e. extract, or semits) that had this issue on this system.

craigsapp commented 7 years ago

I would be curious what you would get if you ran cat bin/* | od -x | grep 0d | wc -l

I ran cat bin/* | od -x | grep 0d | wc -l and get zero instances of \r (after deleting compiled binaries). But that is to be expected, since MacOX and Fedora linux would not spontaneously change newlines to MS-DOS/Windows type.

as it was not the .awk files, but the accompanying bash scripts (i.e. extract, or semits) that had this issue on this system.

Notice that the title of your issue is about *.awk...

The non-default location (different from what is presented in said instructions) was an attempt to see if the space in the computer owner's username (and therefore in their default Cygwin username), was the cause of some of the issues in getting the scripts to run; however, the only issue this resolved is not needing to manually added escapes for spaces in the path after running make install.

That is OK. If you have more than one install location (Humdrum is registered in the search path more than once), you should check to see which one you are using:

   echo $PATH | tr : '\n' | grep humdrum

This will return a list of the installation directories. The first one is the active one (make sure it is the one you expect). You can also check to see where a particular program is being run from the command:

    which extract

This will return the full pathname to the program.

however, the only issue this resolved is not needing to manually added escapes for spaces in the path after running make install.

Yes, using spaces in directory names in unix is greatly frowned upon :-). I should alter the installation process to escape them before adding to the PATH variable. But there would also be problems related to other chacters such as []()*&$#!~'";:/\><,. You installed cygwin in your home directory? I usually install it in c:\cygwin, but that might not help if you have a space in your username. I myself usually install humdrum (or humdrum-tools) in /usr/local.

humdrum scripts consistently threw errors about unexpected character '\r'.

Hmmm. So running your python script solves the problem? I am wondering if the problem is the other way around: that your data file contains \r, and the awk script behind semits or extract is choking on the data rather than the awk interpreter choking on the awk script. But when I process data using MS-DOS/Windows newlines (in MacOS), I am getting correctly processed data files (which keep the MS-DOS/Windows newlines).

I just booted up my old Windows 10 computer, and I installed the latest version of humdrum, and I don't see any MS-DOS/Windows newlines anywhere (checking with the od command)... One thing is that I am probably using an older version of cygwin from before I installed Windows 10. What version of cygwin are you using?

uname -a

for me returns

CYGWIN_NT-6.2-WOW64 computer 1.7.11(0.260/5/3) 2012-02-24 14:05 i686 Cygwin

I am also using Window 10 Pro (although that should not make a difference).


And what is the exact error message you get when you type the command:

census  input.krn

when doing the make bin without the newline correction? (and input.krn is some file you create, obviously :-)

Another test is to run one of the compiled commands and see if they are working correctly (which they should since there are no newline problems possible). Given these two files:

**kern
1C
*-

and

**kern
1G
*-

Then the command:

assemble file1.krn file2.krn

should return:

**kern  **kern
1C  1G
*-  *-

assemble is purely an executable technically without newlines. Note that assemble will probably not work after you run your python script on it...

mshafer1 commented 7 years ago

uname -a I get CYGWIN_NT-10.0 LAPTOP-0DI7S3NM 2.6.1(0.305/5/3) 2016-12-16 11:55 x86_64 Cygwin

echo $PATH | tr : '\n' | grep humdrum I get

/humdrum-install/humdrum-tools/humextra/bin
/humdrum-install/humdrum-tools/humdrum/bin

which extract works as expected

After removing the Path variables (and restarting the shell)

~/humdrum-tools/humdrum/bin
$ ./extract -f 1 test.krn
./extract: line 29: $'\r': command not found
./extract: line 51: syntax error near unexpected token `$'in\r''
'/extract: line 51: `   case "$arg" in

Running assemble in fresh install (without modding files) I get:

 ~/humdrum-tools/humdrum/bin
$ ./assemble test.krn test.krn
!!!COM: Georges Bizet
!!!CNT: Francais
!!!TXO: Francais
!!!OTL: Chanson D'Avril
!!!LYR: Louis Bouilhet
!!!ENC: Jacob Pegg
!!!YOR: Georges Bizet: Twenty Melodies. A Kalmus Classic Edition.
!!!YOR: Miami: Warner Bros. Publications,1990. ISBN 0-7692-4977-9
!!!ODT: 1873/
!!NB: Vocal melody and text only encoded.
**semits**semits
*clefG2 *clefG2
*k[f#c#g*k[f#c#g#]
*M2/4   *M2/4
13 13 1313 13 13 13 13 14 13
13 11 1113 11 11 13 9 9 13 11
13 13 1313 13 13 13 13 14 13
13 11 1313 11 13 9 13 11
11 11 1111 11 11 11 11 11 13 13
6 6 6 6 6 6 6 6 11 11 8
13 13 1313 13 13 13 13 11 13 14 13
13 11 1313 11 13 9 13 11
*-      *-

However, in the modified directory:

/humdrum-install/humdrum-tools/humdrum/bin
$ ./assemble test.krn test.krn
-bash: ./assemble: cannot execute binary file: Exec format error

So the script did erroneously alter the binary files. Attached new version that will skip over exe and hlp files - again not a permanent solution.

I was attempting to verify that this was successful (and modified the Path variable - with escaping the spaces - and restarted the terminal) and I still cannot run extract due to a path error. (for the sake of the owner's privacy, I have replaced their name with [OwnerFirstName] [OwnerLastName])

[OwnerFirstName] [OwnerLastName]@LAPTOP-0DI7S3NM ~
$ where extract
C:\cygwin64\home\[OwnerFirstName] [OwnerLastName]\humdrum-tools\humdrum\bin\extract

[OwnerFirstName] [OwnerLastName]@LAPTOP-0DI7S3NM ~
$ extract -f 1 test.krn
awk: fatal: can't open source file `/home/[OwnerFirstName]' for reading (No such file or directory)

So it appears that spaces in the path do cause additional errors.

I am attaching an altercation to my script that does not touch exe files, so the owner is now able (through running this alteration, but putting the resulting files in the /humdrum-install/... directory) to run extract, assemble, and others. changeLineEndings.zip

craigsapp commented 7 years ago

After removing the Path variables (and restarting the shell):

$ cd ~/humdrum-tools/humdrum/bin
$ ./extract -f 1 test.krn
./extract: line 29: $'\r': command not found
./extract: line 51: syntax error near unexpected token `$'in\r''
'/extract: line 51: `   case "$arg" in

If the humdrum/bin directory is not in the path, the shell scripts would be expected to fail because they need to find the companion AWK script(s). Traditionally this is given in the environmental variable $HUMDRUM, but this version of the Humdrum Toolkit does not set this variable. This version of the Humdrum Toolkit will use that variable if it is not empty, but otherwise it will look at the $PATH variable and use the first directory it finds in the list which has the pattern humdrum/bin. In the case where you remove humdrum/bin from the path, the bash scripts lose the ability to find the AWK scripts. So in this case ./extract will find the bash script, but the script still will not be able to find extract1.awk. Note that for the extract command, you can also use the Humdrum Extras version, which is called extractx.

These are the lines in extract which handle localizing the AWK scripts:

if [ -z $HUMDRUM ]
then
   HUMDRUM=`echo $PATH | tr : '\n' | grep 'humdrum/bin$' | head -n 1 | sed 's/\/bin$//'`
fi

Therefore, the errors such as $'\r': command not found are probably related to the inability of the extract bash script from finding extractx1.awk since it has an empty location for the AWK program.

Have you run the command on the humdrum/bin files before modifying them:

cat humdrum/bin/*.awk | od -x | grep 0d | wc -l

Note that you cannot use humdrum/bin/* unless you temporarily remove the *.exe files from the directory (since they have what look like DOS newlines, but these are just random bytes).

I would expect it to return 0 as it does on my installation of cygwin, and the problem is related to the bash scripts not being able to find the AWK scripts since installation path contains a space. If you installed it in a second location without a space in the path name, then you have to make sure that the first installation location is removed from the $PATH variable, since the old location could be obscuring the second location.

You should first try installing in a directory which does not have a space in it, such as /usr/local/humdrum-tools. Then after doing make install in the new location, edit the ~/.profile and make sure that location is the only one in the PATH variable (or don't run make install and just edit the PATH to update to the new location). Remember to log out and in again to reload the PATH variable.

If you don't want to do that, then you can test updating the Makefile to allow spaces in the path name. On lines 245 and 265, change:

        echo "export PATH=`pwd`/bin:\$$PATH" >> ~/.profile

to

        echo "export PATH=`pwd|sed 's/ /\\\\ /g'`/bin:\$$PATH" >> ~/.profile

This should add a backslash character before the space in the PATH variable string which make the space a literal space rather than a parsing space.

craigsapp commented 7 years ago
[OwnerFirstName] [OwnerLastName]@LAPTOP-0DI7S3NM ~
$ where extract
C:\cygwin64\home\[OwnerFirstName] [OwnerLastName]\humdrum-tools\humdrum\bin\extract

[OwnerFirstName] [OwnerLastName]@LAPTOP-0DI7S3NM ~
$ extract -f 1 test.krn
awk: fatal: can't open source file `/home/[OwnerFirstName]' for reading (No such file or directory)

This seems to indicate that the spaces were not escaped, or the escaping was unescaped at some step in the process of reading the file. You can try variants of escaping:

PATH="/home/my name/humdrum/bin:$PATH"
PATH="/home/my\ name/humdrum/bin:$PATH"
PATH=/home/my\ name/humdrum/bin:$PATH
PATH="/home/my\\ name/humdrum/bin:$PATH"
PATH="/home/my\\\\ name/humdrum/bin:$PATH"

You can check the PATH by typing echo $PATH after relogging in. If you see a backspace before the space, it should be correctly installed in the path name. But even that might not work, as the path will be used inside of the bash script, and the escaped space could become unscaped at that point rather than when reading from .profile.

The best thing would be to install in a directory location which does not contain a space.

mshafer1 commented 7 years ago

I would think that the space issue should just be resolved by encouraging everyone to not use paths that have them.

However, by running

~/humdrum-tools/humdrum/bin
$ ./extract -f 1 test.krn

from inside the humdrum/bin folder, extract (and associated awk files) are all in the "local" path and do not need to added to the path variable.

If you open extract, you will see that line 29 is a blank line (unless viewed with all white space shown). I'm sure you're aware that \r is the escape character for the carriage return in Windows style line endings.

I suspect this is may be caused by the difference in the Cygwin versions (as I had no issues with another install at the same time on Cygwin 2.3.1).

craigsapp commented 7 years ago

However, by running

~/humdrum-tools/humdrum/bin
$ ./extract -f 1 test.krn

from inside the humdrum/bin folder, extract (and associated awk files) are all in the "local" path and do not need to added to the path variable.

No, I explained that ./extract should not work if both the $HUMDRUM variable is empty and there is no humdrum/bin in the path directory. The ./ will allow the shell to find extract, but it does not help extract find extract1.awk. The extract script probably could be set to use "." rather than "" when the bin directory is not found, but one should not be running tools from the bin directory. The error you are getting is probably related to the search path for the AWK script is empty rather than \r characters in the bash or AWK scripts.

Did you run

cat humdrum/bin/*.awk | od -x | grep 0d | wc -l

(or remove *.exe files to also check bash scripts) before running your script? My version of cygwin does not have any \r characters in the AWK or bash scripts. So if you remove something which is not there, then it still is not there. When you remove the \r characters does the extract script start working?

mshafer1 commented 7 years ago

On another machine (so that I could do this locally at my convenience), I just did a fresh install via

cd
git clone --recursive https://github.com/humdrum-tools/humdrum-tools/
cd humdrum-tools
make update
cd humdrum-tools/humdrum/
make bin
make install
source ~/.profile

Tested path:

$ which extract
/home/Matthew/humdrum-tools/humdrum/bin/extract

However:

$ extract -f 1 test.krn
/home/Matthew/humdrum-tools/humdrum/bin/extract: line 29: $'\r': command not found
/home/Matthew/humdrum-tools/humdrum/bin/extract: line 51: syntax error near unexpected token `$'in\r''
'home/Matthew/humdrum-tools/humdrum/bin/extract: line 51: `     case "$arg" in

And the command to count carriage returns: cat humdrum/bin/*.awk | od -x | grep 0d | wc -l

returns

$ cat humdrum/bin/*.awk | od -x | grep 0d | wc -l
36651
craigsapp commented 7 years ago

That is good. So if you remove the DOS/Windows newlines at this point with your script, the extract program will work?

The main possibility I can think of is that it has to do with "git" automatically translating the text files to Window newlines when it downloads them (this currently seems to be the most likely problem).

To test this hypothesis: do the od -x command on the AWK files before they are copied to the bin directory. Do those also have DOS/Windows newlines?

       cat humdrum/bin/toolkit-source/awk-programs/*.awk | od -x | grep 0d | wc -l

If that is non-zero, then git is the most likely culprit, since the repository versions of those files definitely do not have DOS/Windows newlines.

If the count is non-zero, then looking on the web, I find that git can be told to stop being naughty by typing in the command:

     git config --global core.autocrlf false

before cloning the repository. Try that and see what happens when cloning again. In theory this should prevent git from automatically adding DOS/Windows newlines to the text files that it clones.

https://help.github.com/articles/dealing-with-line-endings

mshafer1 commented 7 years ago

Yes, after changing the line endings, bash scripts work.

$ cat toolkit-source/awk-programs/*.awk | od -x | grep 0d | wc -l
36651

So it does appear to be git at fault - set autocrlf to false

removed that install from path and re-performed install in another directory - working now with no issues.

Thank you for your time in trying to figure out the issue.

Solution

Run

git config --global core.autocrlf false

before running

git clone --recursive https://github.com/humdrum-tools/humdrum-tools/