Closed jtmieg closed 6 years ago
Hi,
Have you managed to find another workaround? I am also experiencing this issue.
OK guys, let's try to get this done. Either of you has a minimal example?
Is this referring to htseq-count or to HTSeq as a library? If the latter, what function fails and what's the error message?
Thanks
Hello Fabio
I used a workaround in my python code As you can see i pipe the output of samtools through a gawk which removes the * in column 7 and replaces it by a zero Then i can use import HTSeq for a in HTSeq.SAM_Reader( input_stream ): analyze the alignement in python
this is clearly suboptimal, but it works i am not sure why i had a in column 7 in the first place, but this came out of using several aligners (star, tophat2, hisat2 at least one of them gave me a
it is great if you can just add this replacement in silent mode inside « HTSeq.SAM_Reader( input_stream )
thanks for maintaining the code jean
gg = """ gawk -F '\t' '{gsub("*","0",$7);printf("%s",$1);for(i=2;i<=NF;i++)printf("\t%s",$i);printf("\n");}' """
if input_file == "": input_stream = os.popen( " sort -k 1,1 ")
else: if file_type == "BAM": input_stream = os.popen( "samtools view -h " + input_file + " | " + gg + " | sort -T . -k 1,1 ") elif file_type == "SAMSORTED": input_stream = open(input_file, "r") ; elif file_type == "SAMGZ": input_stream = os.popen( "gunzip -c " + input_file + " | " + gg + " | sort -T . -k 1,1 ") else: input_stream = os.popen( "cat " + input_file + " | " + gg + " | sort -T . -k 1,1 ")
From: Fabio Zanini notifications@github.com Reply-To: simon-anders/htseq reply@reply.github.com Date: Sunday, April 22, 2018 at 3:54 PM To: simon-anders/htseq htseq@noreply.github.com Cc: Jean et Danielle Thierry-Mieg mieg@ncbi.nlm.nih.gov, Author author@noreply.github.com Subject: Re: [simon-anders/htseq] * in column 7 of sam is not recognized (#51)
OK guys, let's try to get this done. Either of you has a minimal example?
Is this referring to htseq-count or to HTSeq as a library? If the latter, what function fails and what's the error message?
Thanks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/simon-anders/htseq/issues/51#issuecomment-383407744, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHNv2phHR4KMXHN6QSLBIQdWnUXBS4jZks5trN_ZgaJpZM4SrChC.
On April 22, 2018 5:36:18 PM PDT, jtmieg notifications@github.com wrote:
Hello Fabio
I used a workaround in my python code As you can see i pipe the output of samtools through a gawk which removes the * in column 7 and replaces it by a zero Then i can use import HTSeq for a in HTSeq.SAM_Reader( input_stream ): analyze the alignement in python
this is clearly suboptimal, but it works i am not sure why i had a in column 7 in the first place, but this came out of using several aligners (star, tophat2, hisat2 at least one of them gave me a
it is great if you can just add this replacement in silent mode inside « HTSeq.SAM_Reader( input_stream )
thanks for maintaining the code jean
Select the input stream
gg = """ gawk -F '\t' '{gsub("*","0",$7);printf("%s",$1);for(i=2;i<=NF;i++)printf("\t%s",$i);printf("\n");}' """
if input_file == "": input_stream = os.popen( " sort -k 1,1 ")
sys.stdin
else: if file_type == "BAM": input_stream = os.popen( "samtools view -h " + input_file + " | " + gg
- " | sort -T . -k 1,1 ") elif file_type == "SAMSORTED": input_stream = open(input_file, "r") ; elif file_type == "SAMGZ": input_stream = os.popen( "gunzip -c " + input_file + " | " + gg + " | sort -T . -k 1,1 ") else: input_stream = os.popen( "cat " + input_file + " | " + gg + " | sort -T . -k 1,1 ")
From: Fabio Zanini notifications@github.com Reply-To: simon-anders/htseq reply@reply.github.com Date: Sunday, April 22, 2018 at 3:54 PM To: simon-anders/htseq htseq@noreply.github.com Cc: Jean et Danielle Thierry-Mieg mieg@ncbi.nlm.nih.gov, Author author@noreply.github.com Subject: Re: [simon-anders/htseq] * in column 7 of sam is not recognized (#51)
OK guys, let's try to get this done. Either of you has a minimal example?
Is this referring to htseq-count or to HTSeq as a library? If the latter, what function fails and what's the error message?
Thanks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/simon-anders/htseq/issues/51#issuecomment-383407744, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHNv2phHR4KMXHN6QSLBIQdWnUXBS4jZks5trN_ZgaJpZM4SrChC.
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/simon-anders/htseq/issues/51#issuecomment-383425227
Hi Fabio,
For me this problem occurred when I was using htseq-count
. However, I think that it was simply a conflict between two HTSeq versions on my system, since I don't experience this problem anymore after a clean re-install (with version 0.9.1
).
ok guys, seems like you solved on your own, closing until further notice
In SAM files, one often encounters a in column 7, and this is in conformity with the SAM specification. HTseq crashes on the . If you replace the by a zero 0 in column 7, then HTseq is happy. I suggest that HTseq should accept the in column 7, but have not studied all the consequences. What do you think Thank you