finding non-productive sequences(original/no hypermutation/no insertion/no deletions)

decenwang commented 5 years ago

Hi Quentin,

According to your published article, How could you find out and extract the non-productive sequences (for new model construction) from the raw data? Do you have any good ideas?

In your article, page 3, you mentioned: "By contrast, V and J usage varied moderately but significantly across individuals,......., suggesting possible primer-dependent biases." How could you understand this fact? after selection, the survived T cells are MHC-dependent, and the MHCs in individuals are substantially distinctive. Thanks!

Cheers,

Decen

qmarcou commented 5 years ago

Hi @decenwang

The non coding sequences are defined as sequences that are known not to code for a viable receptor containing either a frame shift or a stop codon within the CDR3 region.There is a slight difference with the non productive ones, a term designating any sequence that does not code for a viable receptor (e.g a sequence producing a non folding but in frame receptor). We hypothesize that non coding sequences are non productive, the reverse is a priori not true.

One can find the non coding sequences via sequence alignment of the genomic templates, allowing to find the CDR3 position. There is for now no built in way of doing this via IGoR and you may want to use a different software (for now) for this pre processing step.

As for your third point I'm not sure I understand the question since you abbreviated the original sentence of the paper:

By contrast, V and J gene usage varied moderately but significantly across individuals, and even more across sequencing technologies, suggesting possible primer-dependent biases.

Just to make sure we're on the same page, this sentence means primer-dependent biases are likely because differences are much larger among sequencing technologies than among different individuals using the same technology. Although I agree with your MHC point for productive sequences, please bear in mind that model learning on non coding sequences a priori only reflects statistics from V(D)J recombination and not central/peripheral selection. Differences in gene usage among individuals on models learned on non coding sequences only reflect an individual's V(D)J genes properties (positions, number of copies etc)

Best, Quentin

decenwang commented 5 years ago

Hi Quentin, @qmarcou Thanks for your reply. I think you are right. Anyway I have another idea. primer -dependent bias is innate for primer pairs(bias 1). different sequencing technologies will enlarge this bias(bias 2), and also, templates in individuals are diverse, so the primer pair may find the suitable template(bias 3). However, I have spent more 1 month learning IGoR, but still not keep forward. I will post questions as few as possible. Hopefully, you can respond when you are available.

Thanks a lot!

Best regards,

Decen

qmarcou / IGoR

finding non-productive sequences(original/no hypermutation/no insertion/no deletions) #43