nf-core / airrflow

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
https://nf-co.re/airrflow
MIT License
54 stars 35 forks source link

Automatic clone_threshold is 'NA'. Consider setting --clonal_threshold manually. #322

Closed petemeng closed 4 months ago

petemeng commented 7 months ago

Description of the bug

I encountered an error when running the command : "Automatic clone_threshold is 'NA'. Consider setting --clonal_threshold manually." After setting the --clonal_threshold parameter to 0.01, it ran successfully.

Note: I am using test data, I just downloaded the test data locally.

Command used and terminal output

nextflow run airrflow/main.nf \
> -profile docker \
> --mode fastq \
> --input data/10x_sc_raw.tsv \
> --library_generation_method sc_10x_genomics \
> --reference_10x /Project/Nextflow/airrflow/data/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0.tar.gz \
> --outdir ./results  \
> --imgtdb_base /Project/Nextflow/airrflow/data/imgtdb_base.zip \
> --igblast_base /Project/Nextflow/airrflow/data/igblast_base.zip \
> --clonal_threshold auto

Relevant files

.nextflow.log

System information

Nextflow 23.04.3 Container : Docker OS : Ubuntu18.04 Version of nf-core/airrflow: 3.3.0

ggabernet commented 7 months ago

Hi @petemeng , yes this is a desired behaviour of airrflow, that it will notify the user if the clonal threshold could not be set automatically. You can find a report under results/clonal_analysis/find_threshold/index.html that shows a plot of the hamming distance distribution that can be used to find the appropriate threshold manually according to your dataset.

For example, for this analysis:

Screenshot 2024-04-16 at 10 12 06

A threshold at around 0.11 seems to be able to separate sequences forming clones from singletons. A more detailed explanation on how this method works, can be found on the Immcantation Shazam vignette: https://shazam.readthedocs.io/en/stable/vignettes/DistToNearest-Vignette/

petemeng commented 7 months ago

Thank you for your answer.

zhanyinx commented 2 months ago

Hi there,

thanks for setting up this nice workflow. I encounter this similar problem. When I open the html report, it says:

## All `dist_nearest` values are NA. Skipping threshold analysis.

How could I set the threshold if all distances are NA?

THanks for your help Best Zhan