samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
655 stars 173 forks source link

Specification of PileUp in hts-specs #98

Closed karel-brinda closed 9 years ago

karel-brinda commented 9 years ago

Hi, is there any reason why the PileUp format is not included in this repo? Does there exist any up-to-date specification? I found only http://samtools.sourceforge.net/pileup.shtml but it is marked as deprecated.

jmarshall commented 9 years ago

This repository has specifications for (more or less) well-defined interchange file formats, which are implemented by lots of tools and libraries.

Samtools's pileup format on the other hand is one simple text representation of the idea of a pileup, but it's not (I think) output by other tools and it's not really an interchange format, it's just that various scripts happen to more or less parse it.

It's documented under mpileup in the samtools man page. The web page you found provides a bit more of a tutorial, and it would be good if it was updated to cover mpileup.

karel-brinda commented 9 years ago

Thank you for your answer.

I think that PileUp is a widely used interchange file format but without a single standard. There are many programs working with it. Just a few examples (however, I could find much more):

karel-brinda commented 9 years ago

@jmarshall This e-mail is another example why a specification for PileUp would be useful and appreciated: http://sourceforge.net/p/samtools/mailman/message/34409656/.

karel-brinda commented 8 years ago

@jmarshall I am very sorry that I ask you again about the pileup format but I cannot find the following information anywhere (neither in samtools man page, nor on http://samtools.sourceforge.net/pileup.shtml). How should be represented unknown base qualities/BAQs? I see that samtools uses ~ in such situation but I don't know if this is a standard representation or an adhoc solution. Can be BAQ present for few bases and ~'s used for the rest (for a single position)? Thank you in advance.

jmarshall commented 8 years ago

As previously noted, there is no specification for samtools's mpileup output. There is only what samtools does in practice, and the incomplete documentation in samtools's man page. if you would like to see improvement in that documentation, you would do well to ask these questions as documentation issues against samtools.

Samtools just prints out the base qualities recorded in the bam1_t, converting to ASCII and clamping at 126 / ~. This is of course the lowest base quality in SAM, and is what */BAM's-0xFF will be output as.