Open drj11 opened 7 years ago
Culprits are evidently non-ASCII characters. So it's some sort of encoding issue.
By inspecting the zip file, it looks like some of the text is encoded in Windows-1252. 0x96 is used for dash
(should be a Unicode U+002D HYPHEN-MINUS or U+2212 MINUS SIGN), and 0x91 and 0x92 are used for "smart quotes".
In the screenies above, the Unicode U+FFFD REPLACEMENT CHARACTER (question mark in diamond) appears in those cases.
However, some of the text is irreversibly corrupt in the ISA-tab itself. Instead of -80
we see ?80
, and in this case that is an ASCII question mark character (0x3F).
(from a quick look at the ISA-tab documentation) Seems that ISA-tab does not declare the character encoding.
However, there is a strong recommendation that ISA-tab files be in UTF-8 encoding http://isa-specs.readthedocs.io/en/latest/isatab.html#format:
Files SHOULD be encoded using UTF-8.
ISA-Tab files should be encoding in UTF-8.
Diagnosis:
If you have a command line, you can inspect the file encoding using unzip
and file
.
Example on ISA-Tab known to be okay (from SCC):
$ unzip -p isa_9905_733878.zip 'i_*.txt' | file -
/dev/stdin: ASCII text, with very long lines
Example on ISA-Tab that I modified to include UTF-8:
$ unzip -p isa-unicode.zip 'i_*.txt' | file -
/dev/stdin: UTF-8 Unicode text, with very long lines
Example in this issue, that displays incorrectly on GTC:
$ unzip -p john_archive_3_CmphI1W.zip 'i_*.txt' | file -
/dev/stdin: Non-ISO extended-ASCII text, with very long lines, with CRLF, LF line terminators
It may help to start ISA-Creator with the file.encoding
set to utf-8.
You will need to modify this command line, but the important bit is -Dfile.encoding=utf-8
:
java -jar -Dfile.encoding=utf-8 /Applications/ISAcreator-1.7/ISAcreator.app/Contents/Resources/Java/ISAcreator.jar
(that suggestion was from https://groups.google.com/forum/#!topic/isaforum/03P91ZQ1mj0)
For at least one of our ISA-Tab files, text from it is corrupted when displayed.
EG (2017-05-09) https://beta.genometranslationcommons.org/#/preview/e08ed47c-7a9a-4b55-9e06-e5b7e7afd91c
Example is Study Summary display:
Example in Protocols display: