Help and idea - Githubissues

beginner984 commented 1 year ago

Hi

I need your personal point of view please

I have two bulk RNA-seq patients (PBMC) on which I run mixcr

For the same two patients I have 5' single cell on which I run TRUST4

I am seeing more clones derived by sc RNA-seq and more clonality derived by Bulk RNA-seq

I named my bulk RNA-seq samples as TCR and my sc RNA-seq as TRUST

Rplot02 Rplot07

Please, personally do you have any idea what could be an interpretation for this?

Thank you for any help

PoslavskySV commented 1 year ago

Hi,

we recommend to use MiXCR v4.1 to process 10x and bulk data in order to obtain reliable results. For 10x you can use specifically optimized preset:

https://docs.milaboratories.com/mixcr/reference/overview-built-in-presets/#10xgenomics

This way you'll get reliable results, and then would be able to make conclusions.

mehdiborji commented 1 year ago

Did you sort the single cells? That's what it looks like. Is the bulk data from total RNA? It's not surprising if bulk RNA has lots less TCRs in it because of not being sorted

beginner984 commented 1 year ago

Did you sort the single cells? That's what it looks like. Is the bulk data from total RNA? It's not surprising if bulk RNA has lots less TCRs in it because of not being sorted

You mean for any clonotype reconstraction I should sort the cells? Even when in single cell RNA-seq we have a lot of T cells? If clearly bulk RNA-seq give less clonotype, then why people still use that when they can use scRNA-seq?? And should we sort T cells or just sorting B cells is sufficient? Because from whole PBMC we get a few B cells

Please could you help me in answering these questions?

By the way, that is a great pleasure, as a scientist from my country (Iran) has commented on my post🙂

mehdiborji commented 1 year ago

Haha good to know we're from the same country! small world! You can msg. me on linkedin!

My question was is your analysis based on same type of RNA but one is single cell, one is bulk? Did you do enrichment of TCR on the bulk data? People don't use bulk RNA for TCR. Total bulk RNA from PBMC will have very few CD3Rs without targeting or at least sorting cells into just T cells. The way illumina sequencing on single cell 5' works has a somewhat good chance of hitting CD3Rs quite often, whereas random fragmentation of bulk RNA has less chance of hitting that exact region. You still got quite a lot of CDRs for an unsorted PBMC which is why suspected it is sorted. Do you have some CD3Rs in bulk with many reads which are not present in single cell? There's so many unknown factors in the way you explain your data. You ask why people use bulk but you don't clarify what kind of bulk. If I just want to know CDR3s for large amount of cells bulk with TCR targeting is a lot more cost effective. Single cell will cost 100 times more money for same number of CDR3s. On the other hand single cell will give phenotype and pairing of TCRs. It might help more if you post outputs of mixcr and trust4 here to understand it better. Also cellranger outputs to know how many reads and how many cells you have. and also clarify you scientific questions.

dbolotin commented 1 year ago

Yes, I agree with @mehdiborji , to give you some advise, that would be helpful to know more about the datasets and how exactly they were generated (kit / protocol) and sequenced (read length at least).

If you have non-enriched (I mean non-V(D)J-enriched, after emulsion) 5' single-cell 10x data, prepared according to the 10x's manual, then, very approximately, you should catch 30\~50% of the T-cells, with 20~30% of the cells having both TRA+TRB chains (the main factor here seems to be the quality of size selection, preformed in the wet lab, depending on it you can get significantly more or less), and somewhat more B-cells (as the level of expression is higher). This is not to mention that PBMC contains ~50% T-cells and ~10% B-cells (again, actual numbers in the samples may be significantly different, these are average numbers). For the V(D)J enriched library, virtually 100% of the T-/B-cells and TCRs/BCRs must be reconstructed by MiXCR.

As for the the bulk RNAseq, it depends on many factors and can be anywhere between 1 CDR3 per 10^5 to 1 CDR3 per 10^7 reads in the sample.

And the last important point in this respect is that single-cell and rna-seq datasets are obviously prepared on different sets of cells, so it might be even harder to find the intersection between them, because of the cell sampling. This will highly depend on the repertoire structure, how many expanded clones are there in the mix.

As for the comparison with TRUST and other software packages, there are several very important types of problems, associated with analysis of such type of low yield libraries, that, if not properly accounted for, will lead to incorrect conclusions about the datasets in question.

there are many non-TCR / -IG sequences which look like one, such sequences may yield false CDR3's, and what is more dangerous, reproducible false CDR3's, that will look like false overlap between samples. MiXCR was thoroughly tuned (on real and in-silico generated data), to prevent this from happening. So, for RNASeq, it gives zero false CDR3s of this sort.
to increase the total yield, it is beneficial to find partial sequences with only parts of CDR3's and assemble the whole CDR3 from such halves. This procedure should, again, be very strictly controlled, because all CDR3s consists of similar parts (V, D and J genes) and false intersection can be easily found. Resulting sequence will be a chimeric sequence which is not actually present in the sample (the false positive). This type of false-positives will just falsely increase the diversity, and is not that easy to spot without control datasets.
and the most obvious source of false diversity is sequencing and amplification errors, which creates similar CDR3 but with one or two substitutions or, less often, indels.

all those sources of false-positives are very strictly controlled in MiXCR (by tuned aligners, NDN-aware partial-assembly algorithms and multi-layer error corrections respectively). MiXCR results showed high level of reliability in many studies.

PoslavskySV commented 1 year ago

To add to what Dmitry wrote about incorrect conclusions made on basis of false-positives reported by such tools as TRUST, you can check discussion in this article: https://www.nature.com/articles/nbt.3979, and find some really bad examples here: nbt.4296.pdf

milaboratory / mixcr

Help and idea #823