Question about output - Githubissues

Hi Ying,

If you mean in how many input sequences a TCR was identified, then you can use the following command: rtcr Convert -i r.dat | awk -F"\t" 'NR>1{c+=$15}END{print c}'

If you mean in how many input sequences a V or J were identified, then you can use the following command: zcat alignments.sam.gz | awk -F"\t" '$3~/TR[AB]V/{v+=1}$3~/TR[AB]J/{j+=1}END{print "v count = " v, "\nj count = " j}'

Every record (a line in results.tsv) identifies the rearranged V(D)J sequence coding for one of the chains of the TCR heterodimer. Here, each record is referred to as a 'clonotype', although technically, a clonotype refers to the DNA rearrangement of both chains (which due to experimental limitations is often not sequenced together). The "sequence" field shows the V(D)J sequence (excluding the constant region). Since there is no somatic hypermutation for T cells, the DNA rearrangement coding for a TCR chain can be uniquely identified by showing only the combination of V allele identifier ("v_call" field), the junction nucleotide sequence ("junction" field), and the J allele identifier ("j_call" field).

Best wishes, Bram

uubram / RTCR

Question about output #20