lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
481 stars 133 forks source link

questions about sam2tsv output file #224

Closed seven1112233 closed 1 year ago

seven1112233 commented 1 year ago

Dear author: Your tool-sam2tsv is helpful. I try to convert bam file to tsv with this tool and something strange happen. The same output tsv files have some lines not be seen in the linux platform but be seen in the windows(or R) as the figure shows. Why? i am so confused. I am looking forward to your reply! Thank you very much! same file in linux 1670381744268

same file in windows(wps)(line 62 can not be seen in linux) 1670381767649

lindenb commented 1 year ago

The same output tsv files have some lines not be seen in the linux platform but be seen in the windows(or R) as the figure shows. Why?

problem of delimiter when importing in excel.

don't use excel.

seven1112233 commented 1 year ago

Thanke you for your reply. But actually I need work with these bam2sv files using R. I use read.csv(sep="\t") to read bam2tsv files in R, which showing the same problem as in the excel. How I solve this problem?

lindenb commented 1 year ago

it might be a bug ?

what is the output of

awk -F '\t' '{print NF}' sam2tsv.output.tsv | sort | uniq 

?

seven1112233 commented 1 year ago

I find the reason why the problem happens. ref_QUAL have some records including double quatation marks. Quotation marks(") which make "\tvalue\tvalue" as one character. So when I using R or python work with this file I get something wrong.

Dinasour @.***

 

------------------ 原始邮件 ------------------ 发件人: "lindenb/jvarkit" @.>; 发送时间: 2022年12月8日(星期四) 凌晨0:06 @.>; @.**@.>; 主题: Re: [lindenb/jvarkit] questions about sam2tsv output file (Issue #224)

it might be a bug ?

what is the output of awk -F '\t' '{print NF}' sam2tsv.output.tsv | sort | uniq
?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>