infertux / bashcov

Code coverage tool for Bash
MIT License
151 stars 20 forks source link

Invalid byte sequence in UTF-8 (ArgumentError) #27

Closed lark047 closed 1 year ago

lark047 commented 7 years ago

Hello, I've just upgraded to the latest (1.5.1) and I'm getting "invalid byte sequence in UTF-8 (ArgumentError)" when running bashcov against a very large legacy project. I don't think I saw that error when using version 1.2.1. With that version, I had 18 files listed in the coverage report and now I have only 7.

When all tests have been run, I see the following:

/home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:65:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:65:in `match'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:65:in `mark_multiline'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:37:in `block in complete_coverage'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:35:in `each'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:35:in `each_with_index'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:35:in `complete_coverage'
        from /home/aclark/workspace/bashcov/lib/bashcov/runner.rb:149:in `block in mark_relevant_lines!'
        from /home/aclark/workspace/bashcov/lib/bashcov/runner.rb:147:in `each_pair'
        from /home/aclark/workspace/bashcov/lib/bashcov/runner.rb:147:in `mark_relevant_lines!'
        from /home/aclark/workspace/bashcov/lib/bashcov/runner.rb:68:in `result'
        from /home/aclark/workspace/bashcov/bin/bashcov:13:in `<main>'

and, when handling the exception in mark_multiline,

/home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:87:in `sub!': invalid byte sequence in UTF-8 (ArgumentError)
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:87:in `relevant?'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:83:in `mark_line'
        from /home/aclark/workspace/bashcov/lib/bashcov/lexer.rb:57:in `block in complete_coverage'
[...]

This code is likely riddled with invalid characters vis-a-vis UTF-8.

Anyone have any ideas how to get around this and report the true coverage? If it matters, I'm running bashcov 1.5.1 on Cygwin/Windows 8.1. Thanks in advance!

jola5 commented 7 years ago

I don't have any idea either, but the exact same message is bothering me as well:

/x/y/z/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:65:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:65:in `match'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:65:in `mark_multiline'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:37:in `block in complete_coverage'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `each'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `each_with_index'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `complete_coverage'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:149:in `block in mark_relevant_lines!'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:147:in `each_pair'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:147:in `mark_relevant_lines!'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:68:in `result'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/bin/bashcov:13:in `<top (required)>'
        from /home/johannes/.gem/ruby/2.4.0/bin/bashcov:22:in `load'
        from /home/johannes/.gem/ruby/2.4.0/bin/bashcov:22:in `<main>'

Running bashcov 1.5.1 on arch. But it's the same on TravisCI (which should be Ubuntu 14.04): https://travis-ci.org/jola5/gtv/jobs/267326651

Unfortunately this project seems abandoned.

tomeon commented 7 years ago

@lark047 and @jola5 -- can you provide steps to reproduce the issue? @jola5 -- I can't find the error you posted in your comment anywhere in the TravisCI output from https://travis-ci.org/jola5/gtv/jobs/267326651, nor in any of the several other TravisCI runs I checked; I also can't reproduce it by running bash -xv ./make.sh from the coverage-support branch of your gtv repository (like you, I'm running bashcov 1.5.1 on Arch Linux with ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]).

jola5 commented 7 years ago

@BaxterStockman yeah, sorry for that. It's because I supressed error out on this travis build. On the current support-coverage branch I am able to reproduce the error like this:

#!/bin/bash

# assuming you're in the repository root
export GTV=./src/git-tag-version
export GIT=$(which git) # or path to git in ./build, if you've run './make.sh git -g'2.14.1'' before
export BASHCOV=/my/path/to/bashcov # or $(which bashcov)
export BATS=$(which bats) # or ./bats/bin/bats, if you've run './mash.sh test' before
${BASHCOV} --root ./ --mute ${BATS} ./test/*.bats

Thanks for looking into this.

Maybe it's because bashcov is not in my path and I'm trying to call it directly instead?

My ruby version is the same ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]. Arch ftw!

jola5 commented 7 years ago

I tried it with bashcov, or to be precise .../.gem/ruby/2.4.0/bin in my path with no success either.

jola5 commented 7 years ago

@BaxterStockman here you can see the error message 83.2 https://travis-ci.org/jola5/gtv/jobs/267750263

Funny enough, on my other configuration 83.1 https://travis-ci.org/jola5/gtv/jobs/267750260 bashcov seems to be working ok.

The only difference is that on 83.1 I'm using the systems' default Git installation, whereas on 83.2 I'm downloading, compiling and testing against a custom Git version (using GIT enviroment variable).

jola5 commented 7 years ago

@BaxterStockman or @any1else, any update on this issue?

infertux commented 7 years ago

@lark047 @jola5 It seems like you have dodgy characters in your Bash files which fail to be decoded as UTF-8 characters. Can you try to run this branch and check the output? This should raise an exception with the faulty characters. If you use Bundler, you can install this branch by adding github: "infertux/bashcov", branch: "invalid-utf8" in your Gemfile.

jola5 commented 7 years ago

Sorry, it took a while. Here is the output with your invalid-utf8 branch:

/home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:65:in `rescue in mark_multiline': "#!/bin/sh\n\ntest_description='grep icase on non-English locales'\n\n. ./lib-gettext.sh\n\ntest_expect_success GETTEXT_ISO_LOCALE 'setup' '\n\tprintf \"TILRAUN: Hall\xF3 Heimur!\" >file &&\n\tgit add file &&\n\tLC_ALL=\"$is_IS_iso_locale\" &&\n\texport LC_ALL\n'\n\ntest_expect_success GETTEXT_ISO_LOCALE,PCRE 'grep pcre string' '\n\tgit grep --perl-regexp -i \"TILRAUN: H.ll\xF3 Heimur!\" &&\n\tgit grep --perl-regexp -i \"TILRAUN: H.LL\xD3 HEIMUR!\"\n'\n\ntest_done\n" (RuntimeError)
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:65:in `mark_multiline'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:37:in `block in complete_coverage'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `each'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `each_with_index'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/lexer.rb:35:in `complete_coverage'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:149:in `block in mark_relevant_lines!'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:147:in `each_pair'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:147:in `mark_relevant_lines!'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/lib/bashcov/runner.rb:68:in `result'
        from /home/johannes/.gem/ruby/2.4.0/gems/bashcov-1.5.1/bin/bashcov:13:in `<top (required)>'
        from /home/johannes/.gem/ruby/2.4.0/bin/bashcov:23:in `load'
        from /home/johannes/.gem/ruby/2.4.0/bin/bashcov:23:in `<main>'

I am not sure this is related to an improperly formatted/encoded source file of mine. Since locales are mentioned in the message above, here is my locale settings - just in case:

$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER=en_GB.UTF-8
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT=en_GB.UTF-8
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
infertux commented 7 years ago

It's crashing because of this: Hall\xF3 Heimur!. 0xF3 is not a valid character in UTF-8. I'll try to make Bashcov ignore invalid characters but in the meantime you can fix it by re-encoding the file properly.

infertux commented 7 years ago

@lark047 @jola5 Release 1.6.0 should fix this. Sorry for the wait.

jola5 commented 7 years ago

Thanks a lot. Release 1.6.0 fixes my issue.

However, I want to add one last remark on my particular error. As I said previously the file did not seem familiar to me at all. So grepping for Heimur returns these results:

Binary file ./build/git-2.14.1/po/build/locale/is/LC_MESSAGES/git.mo matches
./build/git-2.14.1/po/is.po:40:msgstr "TILRAUN: Halló Heimur!"
Binary file ./build/git-2.14.1/t/t7813-grep-icase-iso.sh matches
./build/git-2.14.1/t/t0204-gettext-reencode-sanity.sh:16:    printf "TILRAUN: Halló Heimur!" >expect &&
./build/git-2.14.1/t/t0204-gettext-reencode-sanity.sh:28:    printf "TILRAUN: Halló Heimur!" | iconv -f UTF-8 -t ISO8859-1 >expect &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:8:   test_write_lines "TILRAUN: Halló Heimur!" >file &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:19:  git grep -i "TILRAUN: Halló Heimur!" &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:20:  git grep -i "TILRAUN: HALLÓ HEIMUR!"
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:24:  git grep --perl-regexp    "TILRAUN: H.lló Heimur!" &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:25:  git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:26:  git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:30:  test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:32:  git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:39:  git grep -i -F "TILRAUN: Halló Heimur!" &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:40:  git grep -i -F "TILRAUN: HALLÓ HEIMUR!"
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:44:  test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 &&
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:46:  git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3
./build/git-2.14.1/t/t7812-grep-icase-non-ascii.sh:51:  git log --format=%f -i -S"TILRAUN: HALLÓ HEIMUR!" >actual &&

So the bad utf8 encoding is not in one of my own source files but in one of the git source files. Strangely enough, I do not check any git files with bascov at all. It seems as if bashcov parses all shell scripts starting from a given root directory.

Any idea why this happens?

infertux commented 7 years ago

Simplecov won't report coverage for files which are not executed. However, Bashcov will report any file ending with .sh. If they are not executed, they will show up as completed missed, i.e. 0 hits for each line. If you want to ignore non-executed files, you can use the --skip-uncovered flag:

bashcov --help 2>&1 | grep 'skip-uncovered'
    bashcov --skip-uncovered ./script.sh
    bashcov --skip-uncovered -- ./script.sh --some --flags
    -s, --skip-uncovered             Do not report uncovered files

This approach to scan for .sh files is simple but a bit naive. Ideally, we'd like to detect file types more reliably with #18.

That being said, it seems you found a bug: Binary file ./build/git-2.14.1/t/t7813-grep-icase-iso.sh matches. It doesn't make sense to include binary files. I'll fix that. Thanks for the feedback @jola5 :)

lark047 commented 7 years ago

Thanks, and sorry for the late reply! This does indeed fix the problem I was having.

AlexSkrypnyk commented 1 year ago

I'm getting a bit different output

`scan': invalid byte sequence in US-ASCII (ArgumentError)

      return false if scanner.scan(/#!\s*/).nil?

Works in 3.0.1, broken in 3.0.3

Could be due to https://github.com/infertux/bashcov/pull/74

infertux commented 1 year ago

@AlexSkrypnyk Should be fixed in version 3.1.0. Please update bashcov with gem update bashcov and try again.