Closed mcieslik-mctp closed 5 years ago
Sorry for delayed response.
Thanks for reporting! The sample seem to be very enriched. We never observed such enrichment level, though, we never have samples with such extreme overall BCR expression level as multiple myeloma. MiXCR RNA-Seq pipeline was optimised for conventional RNA-Seq (with ≲0.1% of target molecules). Though, we know about low performance on heavily enriched data (we observed it on in-silica generated samples), and have plans to increase the performance for the case with better algorithms.
What is your overall goal? If the malignancy is not polyclonal, and you are interested only in top clonotypes, you can just analyse small portion of the sample, this should be much faster. Execution time with such many reads degrades to ~ O(N^2), so by decreasing the number of reads 10 fold, you should gain 100x fold decrease in execution time.
If you are still interested in TCR-s present in the sample, please see help page for the filterAlignments
(mixcr filterAlignments -h
).
Summarizing all, you can do something like this:
mixcr align -s hsa -p rna-seq -OallowPartialAlignments=true input_R1.fastq.gz input_R2.fastq.gz all_initial_alignments.vdjca
# Extracting all TCR alignments
mixcr filterAlignments -c TCR all_initial_alignments.vdjca tcr_initial_alignments.vdjca
# Extracting IG alignments form first 500 000 reads
mixcr filterAlignments -n 500000 -c IG all_initial_alignments.vdjca ig_initial_alignments.vdjca
...
And then separately, perform assemblePartial
and assemble
for each of these files.
Thank you for the info, I will follow your suggestion. We are doing CD138 selection and sometimes we end up with an almost pure population of B-cells. Can I use the mergeAlignments function to combine the tcr and ig subsets back?
Yes! This, exactly, was my thought, to suggest you merging those two files, right after I closed my laptop after writing previous message. This should simplify further analysis.
Thanks. I did some more tests and it appears that the -n / --limit switch does not do anything i.e. it does not limit the number of alignments returned (tested on the latest version)
e.g.:
-> mixcr filterAlignments -f -n 100000 -c IG fileuwBa90 aaaa
Filtering: 0%
Filtering: 10.2% ETA: 00:03:05
Filtering: 20.3% ETA: 00:03:32
Filtering: 30.5% ETA: 00:02:56
Filtering: 40.5% ETA: 00:02:22
Filtering: 50.6% ETA: 00:02:02
Filtering: 60.8% ETA: 00:01:47
Filtering: 71.1% ETA: 00:00:50
Filtering: 81.5% ETA: 00:00:59
Filtering: 91.8% ETA: 00:00:24
Written 8974304 alignments (8974814 alignments considered in total)
Am I doing something wrong?
Confirm, it is a bug. We will fix it soon.
Please try this one: http://files.milaboratory.com/mixcr/mixcr-2.1.6-SNAPSHOT.zip
Thanks for the super quick fix! Tests are running, I tried fixing the source code myself by introducing a simple break in the writer loop, and although it worked for small samples it hung with what looks like a race condition on my problematic ones.
Tested the 2.1.6-SNAPSHOT version and it works great, after limiting to 250k reads assemblePartial completes within 20min.
Good! I will leave this issue opened until we review the base assemblePartial procedure, there must be a better algorithm for this, one that can handle the whole dataset.
I am running MiXCR on RNA-seq data from multiple myeloma patients, and noticed a very unpredictable run-time for some sample. Specifically the assemblePartial rounds can take 10s of hours (sometimes days, sometimes I just give up) for select samples. Not sure if this can be considered a bug, but I would appreciate suggestions how to work-around.
Alignment:
assemblePartial: