statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
118 stars 25 forks source link

The total number of pass-filtered reads is too less #90

Open Moxsf opened 2 years ago

Moxsf commented 2 years ago

my command: demuxlet --sam ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/gex_possorted_bam.bam --group-list ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/filtered_feature_bc_matrix/barcodes.tsv.gz --field GT --vcf ~/Project_kunlun/00_preData/GenoType/VCF_filter/05_WeGene_HUVEC_change_ALDI.lo_hg38.vcf.gz --alpha 0 --alpha 0.5 --out Multi_HUVEC_gex_alpha

and the report:

NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 4564 UMI warnings... NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 10839629 droplet/cell barcode warnings... NOTICE [2022/02/22 17:51:34] - Finished reading 1 markers from the VCF file NOTICE [2022/02/22 17:51:34] - Total number input reads : 310066990 NOTICE [2022/02/22 17:51:34] - Total number valid droplets observed : 15480 NOTICE [2022/02/22 17:51:34] - Total number valid SNPs observed : 1 NOTICE [2022/02/22 17:51:34] - Total number of read-QC-passed reads : 198500170 NOTICE [2022/02/22 17:51:34] - Total number of skipped reads with ignored barcodes : 36629150 NOTICE [2022/02/22 17:51:34] - Total number of non-skipped reads with considered barcodes : 161816433 NOTICE [2022/02/22 17:51:34] - Total number of gapped/noninformative reads : 161816426 NOTICE [2022/02/22 17:51:34] - Total number of base-QC-failed reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of redundant reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads : 7 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads overlapping with multiple SNPs : 0 NOTICE [2022/02/22 17:51:34] - Starting to prune out cells with too few reads... NOTICE [2022/02/22 17:51:34] - Finishing pruning out 0 cells with too few reads... NOTICE [2022/02/22 17:51:34] - Starting to identify best matching individual IDs NOTICE [2022/02/22 17:51:34] - Identifying best-matching individual..

When I checked the operation report, I found that the number of reads I screened was only 7. Is there any recommendation for screening criteria? Thanks for your reply.

wangweikai233 commented 10 months ago

my command: demuxlet --sam ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/gex_possorted_bam.bam --group-list ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/filtered_feature_bc_matrix/barcodes.tsv.gz --field GT --vcf ~/Project_kunlun/00_preData/GenoType/VCF_filter/05_WeGene_HUVEC_change_ALDI.lo_hg38.vcf.gz --alpha 0 --alpha 0.5 --out Multi_HUVEC_gex_alpha

and the report:

NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 4564 UMI warnings... NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 10839629 droplet/cell barcode warnings... NOTICE [2022/02/22 17:51:34] - Finished reading 1 markers from the VCF file NOTICE [2022/02/22 17:51:34] - Total number input reads : 310066990 NOTICE [2022/02/22 17:51:34] - Total number valid droplets observed : 15480 NOTICE [2022/02/22 17:51:34] - Total number valid SNPs observed : 1 NOTICE [2022/02/22 17:51:34] - Total number of read-QC-passed reads : 198500170 NOTICE [2022/02/22 17:51:34] - Total number of skipped reads with ignored barcodes : 36629150 NOTICE [2022/02/22 17:51:34] - Total number of non-skipped reads with considered barcodes : 161816433 NOTICE [2022/02/22 17:51:34] - Total number of gapped/noninformative reads : 161816426 NOTICE [2022/02/22 17:51:34] - Total number of base-QC-failed reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of redundant reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads : 7 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads overlapping with multiple SNPs : 0 NOTICE [2022/02/22 17:51:34] - Starting to prune out cells with too few reads... NOTICE [2022/02/22 17:51:34] - Finishing pruning out 0 cells with too few reads... NOTICE [2022/02/22 17:51:34] - Starting to identify best matching individual IDs NOTICE [2022/02/22 17:51:34] - Identifying best-matching individual..

When I checked the operation report, I found that the number of reads I screened was only 7. Is there any recommendation for screening criteria? Thanks for your reply.

Hello, I also encountered this problem. Have you solved it now?

hyunminkang commented 10 months ago

What does samtools flagstat say? Filtering is mostly done by --excl-flag option. The default is 0xF04, which removes duplicates, supplementary, and unmapped reads. It also uses reads with minimum mapping quality of 20. I would check both flags and mapping quality of the reads.

wangweikai233 commented 10 months ago

What does samtools flagstat say? Filtering is mostly done by --excl-flag option. The default is 0xF04, which removes duplicates, supplementary, and unmapped reads. It also uses reads with minimum mapping quality of 20. I would check both flags and mapping quality of the reads.

Thank you for your reply. I found no information in the qual column in our bam file, which I think caused all reads to be filtered.

wangweikai233 commented 10 months ago

Our bam file like this : FP300000661L1C002R03901629535 16 chr1 89316 1 100M * 0 0 TTACTGTTTTAGTTTGCATTTCTCTGCTTACAAATGGATTACACGCATTTTCATGTGCTGTTGGCTACTTATTCATTCAGAAAACATACTAAGTGCTGGC * NH:i:4 HI:i:1 AS:i:98 nM:i:0 UR:Z:CAAGTAAAAG CB:Z:E7AE9216DB GN:Z:AL627309.1 UB:Z:CAAGTAAAAG RE:A:E DB:Z:CELL4917_N1

The DB is cell barcode, UB is UMI. Our command line is as follows: demuxlet --sam ./merge_sort.bam --tag-group DB --tag-UMI UB --vcf ./muxseq.5.sample_reheader_sort.vcf --field GT --alpha 0 --alpha 0.5 --out ./result/merge_sort

Do you think there are other issues that might cause demuxlet to fail?

wangweikai233 commented 10 months ago

Our log :

NOTICE [2023/11/21 21:44:13] - Reading 350000000 reads at chrX:154400707 and skipping 14368055
NOTICE [2023/11/21 21:44:23] - Reading 351000000 reads at chrY:20592382 and skipping 14461389
NOTICE [2023/11/21 21:44:23] - Finished reading 4863645 markers from the VCF file
NOTICE [2023/11/21 21:44:23] - Total number input reads : 351034528
NOTICE [2023/11/21 21:44:25] - Total number valid droplets observed : 17164
NOTICE [2023/11/21 21:44:25] - Total number valid SNPs observed     : 4863645
NOTICE [2023/11/21 21:44:25] - Total number of read-QC-passed reads : 336559368
NOTICE [2023/11/21 21:44:25] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2023/11/21 21:44:25] - Total number of non-skipped reads with considered barcodes : 336236415
NOTICE [2023/11/21 21:44:25] - Total number of gapped/noninformative reads : 312184218
NOTICE [2023/11/21 21:44:25] - Total number of base-QC-failed reads : 24052197
NOTICE [2023/11/21 21:44:25] - Total number of redundant reads : 0
NOTICE [2023/11/21 21:44:25] - Total number of pass-filtered reads : 0
NOTICE [2023/11/21 21:44:25] - Total number of pass-filtered reads overlapping with multiple SNPs : 0
NOTICE [2023/11/21 21:44:25] - Starting to prune out cells with too few reads...
NOTICE [2023/11/21 21:44:25] - Finishing pruning out 0 cells with too few reads...
NOTICE [2023/11/21 21:44:26] - Starting to identify best matching individual IDs
NOTICE [2023/11/21 21:44:26] - Processing 10000 markers...
NOTICE [2023/11/21 21:44:26] - Processing 20000 markers...
hyunminkang commented 10 months ago

It looks that, in both examples, the number of QC-pass reads are fine, but the number of reads overlapping SNPs do not. Perhaps the contigs do not match between BAM and VCF, or there might be problems in VCF?

wangweikai233 commented 10 months ago

It looks that, in both examples, the number of QC-pass reads are fine, but the number of reads overlapping SNPs do not. Perhaps the contigs do not match between BAM and VCF, or there might be problems in VCF?

Thank you for your suggestion. We will carefully check the VCF file later.

Moxsf commented 10 months ago

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible. Apply all or some of the additional filters: a. Classified as common in the 1000 genomes project. b. Called with high variant confidence (QD > 10.0) c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle
wangweikai233 commented 10 months ago

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible. Apply all or some of the additional filters: a. Classified as common in the 1000 genomes project. b. Called with high variant confidence (QD > 10.0) c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later.

Best wishes.

yuantiaotiao commented 7 months ago

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible. Apply all or some of the additional filters: a. Classified as common in the 1000 genomes project. b. Called with high variant confidence (QD > 10.0) c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later.

Best wishes.

Have you solved this problem? My bam file is also a C4 platform. Look forward to receiving your reply

wangweikai233 commented 7 months ago

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible. Apply all or some of the additional filters: a. Classified as common in the 1000 genomes project. b. Called with high variant confidence (QD > 10.0) c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later. Best wishes.

Have you solved this problem? My bam file is also a C4 platform. Look forward to receiving your reply

I tried to re-run demuxlet after checking the above problems, but the result was still not satisfactory. I think it's a format issue with C4. In the end we used cellSNP + Vireo, but I still think demuxlet is a very nice software tool.