uubram / RTCR

A pipeline for complete and accurate recovery of TCR repertoires from high throughput sequencing data.
GNU General Public License v3.0
21 stars 8 forks source link

Question about output #20

Open wuying1984 opened 2 years ago

wuying1984 commented 2 years ago

Hi Bram,

I am using RTCR to identify the TCR repertoire from amplicon sequencing. I have some more question abouts the output result.

1) how can get the mapping rate for the fastq reads (MiSeq, R1: 301bp, R2: 266bp) I use in the analysis.

2) in the result.tsv file, what is the sequence, is it consensus for a group of clonotype? Can I say one sequence is a clolonotype?

Thank you very much! Best, Ying

uubram commented 2 years ago

Hi Ying,

  1. If you mean in how many input sequences a TCR was identified, then you can use the following command: rtcr Convert -i r.dat | awk -F"\t" 'NR>1{c+=$15}END{print c}'

If you mean in how many input sequences a V or J were identified, then you can use the following command: zcat alignments.sam.gz | awk -F"\t" '$3~/TR[AB]V/{v+=1}$3~/TR[AB]J/{j+=1}END{print "v count = " v, "\nj count = " j}'

  1. Every record (a line in results.tsv) identifies the rearranged V(D)J sequence coding for one of the chains of the TCR heterodimer. Here, each record is referred to as a 'clonotype', although technically, a clonotype refers to the DNA rearrangement of both chains (which due to experimental limitations is often not sequenced together). The "sequence" field shows the V(D)J sequence (excluding the constant region). Since there is no somatic hypermutation for T cells, the DNA rearrangement coding for a TCR chain can be uniquely identified by showing only the combination of V allele identifier ("v_call" field), the junction nucleotide sequence ("junction" field), and the J allele identifier ("j_call" field).

Best wishes, Bram