jhpoelen / bat-taxonomic-alignment

prototype focused on bat names
https://jhpoelen.nl/bat-taxonomic-alignment/
Creative Commons Zero v1.0 Universal
0 stars 2 forks source link

carriage returns in BTA tsv export causing misalignment of data when importing into speadsheets #9

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

to reproduce:

download a BTA version using

curl "https://linker.bio/hash://sha256/981b8f9ece76eb4418fe82e8dfa077165943fe1d63103fa4a25f21a2d7881e75"\
 > bta.tsv

open the bta.tsv into a spreadsheet program like LibreOffice Calc.

Expected to see well-aligned data.

Actual notice apparent truncated lines causing incomplete rows (see screenshot).

Root cause appears to be carriage returns embedded in the tsv file. In vi these carriage returns are encoded as ^M .

cat bta.tsv\
 | grep -n "bunkeri"\
  | head -n1 > aline.tsv

with

cat aline.tsv | wc -l yielding "1", indicated that only a single new line exists (\n). However, when opening the aline.tsv in LibreOffice Calc, multiple lines are observed, apparently caused by occurrences of carriage returns (\r or ^M).

image

image

@n8upham

jhpoelen commented 1 year ago

In their original, the carriage lines are visible also in the google sheets editors -

image

jhpoelen commented 1 year ago

See related issue https://techcommunity.microsoft.com/t5/excel/excel-problem-with-importing-csv-file-with-carriage-return-in-a/m-p/1404378 .

jhpoelen commented 1 year ago

After replacing the carriage returns with newlines using:

cat bta.tsv\
 | tr '\r' ' '\
 > bta-no-carriage-return.tsv

the issue no longer exists.

image

jhpoelen commented 1 year ago

@n8upham I just released a new version of BTA@306cd3c9 via https://jhpoelen.nl/bat-taxonomic-alignment/

and using this new version, I was able to load the bta tsv into spreadsheet ok using

curl "https://linker.bio/hash://sha256/306cd3c999895af317044442e86fac33769ce178f6670b050683ca0ea80e6c67.tsv" > bta.tsv

image

jhpoelen commented 1 year ago

Also, the offending line no longer appears to offend :smile:

image

n8upham commented 1 year ago

Awesome, thanks for letting me know man! —n

On Apr 26, 2023, at 11:31 AM, Jorrit Poelen @.***> wrote:

Also, the offending line no longer appears to offend 😄

https://urldefense.com/v3/__https://user-images.githubusercontent.com/1084872/234669893-e55a683d-62a9-43f7-b16b-23db9cbe1f09.png__;!!IKRxdwAv5BmarQ!d64jSBf2vCAYTtgr0ikAHYjR6FHjETmoe0GO7Nke9ivRHC3CCKxySLgKbz5GRlYvvite9dmxV2TPICTzK1E2yN90Rio$ — Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/jhpoelen/bat-taxonomic-alignment/issues/9*issuecomment-1523871116__;Iw!!IKRxdwAv5BmarQ!d64jSBf2vCAYTtgr0ikAHYjR6FHjETmoe0GO7Nke9ivRHC3CCKxySLgKbz5GRlYvvite9dmxV2TPICTzK1E2GcMkAAE$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC3WZU2V4A7WF6K4VKQY5C3XDFSYLANCNFSM6AAAAAAXMWRJR4__;!!IKRxdwAv5BmarQ!d64jSBf2vCAYTtgr0ikAHYjR6FHjETmoe0GO7Nke9ivRHC3CCKxySLgKbz5GRlYvvite9dmxV2TPICTzK1E27TekFzA$. You are receiving this because you were mentioned.