Closed omegahh closed 1 month ago
Is it a regular build-in human library?
I always use the built-in library,as I described in #1790
@mizraelson I have about 200 BCR samples from a large cohort. So I run align
at first, then refineTagsAndSort
, then assemble
. After this steps, I run findAlleles
on all these '*.clna' files to call all potential alleles in this cohort. But I get "duplicates of found alleles" error. The related commands are shown in below:
Commands for each sample:
mixcr align -f -t 24 -Xmx160g -p generic-amplicon-with-umi -b default --species hsa --rna --tag-pattern "^cagtggtatcaacgcagagt(UMI:NNNNtNNNNtNNNN)tN{7:8}(R1:*)\^N{17}(R2:*)" --tag-max-budget 20 --rigid-left-alignment-boundary --floating-right-alignment-boundary C --assemble-clonotypes-by "[{FR1Begin:CDR1End},{CDR2Begin:FR4End}]" --json-report logs/Library.00_S4A08-UQ03-UT05.mixcr_align.json trim_demux/S4A08-UQ03-UT05_R1.fastq.gz trim_demux/S4A08-UQ03-UT05_R2.fastq.gz tmp/S4A08-UQ03-UT05.vdjca &>> logs/Library.00_S4A08.log
mixcr refineTagsAndSort -f -Xmx160g --json-report logs/Library.00_S4A08-UQ03-UT05.mixcr_refineTags.json tmp/S4A08-UQ03-UT05.vdjca trim_mixcr/S4A08-UQ03-UT05.vdjca &>> logs/Library.00_S4A08.log
mixcr assemble -f -Xmx160g --write-alignments --split-clones-by C --json-report logs/Library.00_S4A08-UQ03-UT05.mixcr_assemble.json trim_mixcr/S4A08-UQ03-UT05.vdjca trim_mixcr/S4A08-UQ03-UT05.clna &>> logs/Library.00_S4A08.log
Command for all ".clna" file:
mixcr findAlleles -Xmx512G -t 96 --force-overwrite --export-library MJBIO_HBCR_Alleles.json --export-alleles-mutations MJBIO_HBCR_Alleles.tsv --json-report MJBIO_HBCR_Alleles_log.json --output-template {file_dir_path}/{file_name}.clns *MJBIO_HBCR*/trim_mixcr/*.clna
output log:
I actually have two questions:
I think findAlleles
should be run on data of the total cohort. Because 1). alleles calling should be benefit from the sufficient data on statistical perspective, 2). all output ".clns" are realigned based on a unify allele library, thus the exported clonotypes are comparable. Am I right?
Should I run 'findAlleles' for BCR repertoires? Considering that BCR has somatic hypermutation. Is mixcr 'findAlleles' command capable in distinguishing between SNPs and SHMs?
I haven't been able to replicate it yet. To answer your questions:
findAlleles
should only be run on samples where you expect identical alleles—typically, this would be samples from the same donor. Mixing donors might result in incorrectly assigned alleles.I see, so I wrongly use the 'findAlleles' command for running it on all samples from different donors. But do you think (may be develop a new command) it is important to mine all potential alleles, especially for finding de-novo alleles, on a big data? I mean If I have a large cohort which the total clonotypes are extremely large, I am willing to integrate them together to mining de-novo alleles. What's your opinion?
If each sample is from a separate organism, it is essential to process them separately using findAlleles. You can then aggregate all the information from the output tables into a single dataset, depending on what you intend to do with it later.Do you still see the issue if samples are processed separately?
No, separately processing is okay. Thank you for your explanation!
As we discussed before, I have a lot of samples run with
generic-amplicon-with-umi
preset. Then I want to executefindAlleles
in all assembled '.clna' files. But I got 'There are duplicates of found alleles' error which you can see in the log file. How to resolve this problem? SlurmJob_POSTIMMU.383868.log