lasigeBioTM / MER

Minimal Named-Entity Recognizer (MER)
http://labs.fc.ul.pt/mer/
56 stars 8 forks source link

awk error when annotating text with newlines #37

Closed LLCampos closed 7 years ago

LLCampos commented 7 years ago

Running this command:

bash get_entities.sh "oxygen carbon" ChEBI

Returns:

0   6   oxygen
7   13  carbon

The command

bash get_entities.sh "oxygen
carbon" ChEBI

Should return the same but instead it returns the following:

awk: cmd. line:1: BEGIN {IGNORECASE = 1} {sub(/carbon|oxygen/,"@@@@@@
awk: cmd. line:1:                                             ^ unterminated string
awk: cmd. line:1: BEGIN {IGNORECASE = 1} {sub(/carbon|oxygen/,"@@@@@@
awk: cmd. line:1:                                          ^ syntax error

This is relevant because there are cases in which you want to annotate text with multiple paragraphs. @fjmc any ideias why awk is raising this error?

LLCampos commented 7 years ago

The fix causes some problems to reappear: https://github.com/lasigeBioTM/MER/issues/12 https://github.com/lasigeBioTM/MER/issues/24

And it really does not solve the bug. For example, if you run

bash get_entities.sh "oxygen

carbon" ChEBI

You get the same result as if the extra newline wasn't there.


Wouldn't be better I we pushed bugfixes to the dev branch and only then to the master branch, when we are sure everything is working?

fjmc commented 7 years ago
echo "${a,,}"
-bash: ${a,,}: bad substitution

replaced by tr '[:upper:]' '[:lower:]'

fjmc commented 7 years ago

using IFS=$(echo -en ""); seems to have solve the problem