pogorely / ALICE

Detecting TCR involved in immune responses from single RepSeq datasets
GNU General Public License v3.0
25 stars 13 forks source link

Alice (advanced) pipeline #8

Open nicholasschwab opened 5 years ago

nicholasschwab commented 5 years ago

dear mikhail,

great program! we would love to try this out in our data sets. however, it seems like the advanced pipeline from the preprint did not make it into the released version yet? could you maybe advice how to incorporate the advanced version, because we struggle with differences in sequencing depth.

best regards, nicholas.

pogorely commented 5 years ago

Hi, Nicholas! Indeed, the advanced pipeline will be released soon. But in my experience, the advanced pipeline does not help with the issue of varying sequencing depth. This happens because greater sequencing depth leads to a larger number of observed clonotypes, and thus larger power to detect clonotypes enriched with neighbors in both pipelines (e.g. in low depth sample you will not find some significant results, just because you do not sample the neighbors).

I advise to subsample data to same depth somehow (i.e. to the comparable number of total unique nucleotide sequences), and then run the algorithm. Another option is to normalize the algorithm output to be less dependent on sequencing depth (in the paper we normalize a number of hits by total number of unique clonotypes in each repertoire). Cumulative fraction of repertoire occupied by hits should be also less dependent on sequencing depth than the absolute number of hits. And finally, you could also look at TCR amino acid sequence of hits and its features (i.e. sequence sharing between samples of the same condition), this is also should be less dependent on depth.

Hope that helps.

Best, Misha

nicholasschwab commented 5 years ago

Dear Misha, thanks for the response. We tried to normalize using a fixed amount of sequences (e.g. top10000 or top50000 sequences by rank), but to no avail, there are still more Alice hits corresponding to the original sequencing depth of the file. Same with normalization by dividing by number of unique sequences. We are now in the process of trying the third option (looking at their features) to see, if we can find something of interest.

Looking forward to the updates, best regards! Nicholas.

starenki commented 4 years ago

Hi Mikhail, I'm also very interested in trying the advanced version of the ALICE pipeline. Are you still planning to release it? Thank you.

Best regards, Dima