hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

LILAC: [WARN ] gene depth coverage(A=1098 B=1089 C=1101) too low, exiting #436

Closed Y-u-q-i closed 11 months ago

Y-u-q-i commented 11 months ago

Dear developer: I was using my WES data , here is my input code: java -jar /home/user/lilac_v1.4.2.jar -sample T104 -ref_genome /home/user/genome_ref/Homo_sapiens_assembly38.fasta -ref_genome_version V38 -resource_dir /home/user/HMFtools_resources/ -reference_bam /home/user/SC-N104.bam -tumor_bam /home/ysq/SC-T104.bam -somatic_vcf /home/user/P104.vcf -output_dir /home/user/lilac_output/

And it showed this below during the running process: 12:11:16.884 [INFO ] loaded 230 allele frequencies from file(/home/ysq/HMFtools_resources/lilac_allele_frequencies.csv) 12:11:16.906 [INFO ] Lilac version(1.4.2), key parameters: 12:11:16.906 [INFO ] sample(T104) inputs: tumorBam(true) somaticVCF(true) geneCopyNumber(false) rnaBam(false) 12:11:16.907 [INFO ] minBaseQual(30), minEvidence(2) minFragmentsPerAllele(7) minFragmentsToRemoveSingle(40) maxDistanceFromTopScore(0.005) 12:11:16.907 [INFO ] reading nucleotide file: /home/ysq/HMFtools_resources/hla_ref_nucleotide_sequences.csv 12:11:18.447 [INFO ] loaded 17542 sequences from file /home/ysq/HMFtools_resources/hla_ref_nucleotide_sequences.csv 12:11:18.448 [INFO ] reading protein file: /home/ysq/HMFtools_resources/hla_ref_aminoacid_sequences.csv 12:11:18.965 [INFO ] loaded 13198 sequences from file /home/ysq/HMFtools_resources/hla_ref_aminoacid_sequences.csv 12:11:18.969 [INFO ] loaded 128 common alleles 12:11:19.221 [INFO ] querying records from reference bam /home/ysq/SC-N104_FDHE19H000005-1a-A14-A30_bqsr.bam 12:11:19.864 [INFO ] lowering min base quality(30) to median(0) 12:11:19.867 [WARN ] gene depth coverage(A=1098 B=1089 C=1101) too low, exiting 12:11:19.867 [INFO ] writing failed-sample output to /home/ysq/lilac_output/ 12:11:19.901 [INFO ] Lilac complete, run time(3.00s)

T104.lilac.csv was created, but only contains the header line. T104.lilac.qc.csv only contains one line besides the header, as follows: Status,ScoreMargin,NextSolutionAlleles,MedianBaseQuality,HlaYAllele,DiscardedIndels,DiscardedIndelMaxFrags,DiscardedAlignmentFragments,A_LowCoverageBases,B_LowCoverageBases,C_LowCoverageBases,ATypes,BTypes,CTypes,TotalFragments,FittedFragments,UnmatchedFragments,UninformativeFragments,HlaYFragments,PercentUnique,PercentShared,PercentWildcard,UnusedAminoAcids,UnusedAminoAcidMaxFrags,UnusedHaplotypes,UnusedHaplotypeMaxFrags,SomaticVariantsMatched,SomaticVariantsUnmatched FAIL,0.000,,0,NONE,0,0,0,1098,1089,1101,0,0,0,0,0,0,0,0,NaN,NaN,NaN,0,0,0,0,0,0

PS: The depth of my wes data is 160X, so I don't know what's wrong with it saying "gene depth coverage(A=1098 B=1089 C=1101) too low"

Looking forward to your answers, thanks !

p-priestley commented 11 months ago

This line appears to indicate that no reads were found by lilac:

12:11:19.864 [INFO ] lowering min base quality(30) to median(0)

Otherwise you would expect to see something like:

[INFO ] lowering min base quality(30) to median(28) [INFO ] totalFrags(911) minEvidence(2.0) minHighQualEvidence(0.3)

Could you check your bam: /home/user/SC-N104.bam and ensure that it definitely includes reads overlapping exons in HLA-A, HLA-B and HLA-C?

Y-u-q-i commented 11 months ago

Thanks for your answer! I have checked my bam data( /home/user/SC-N104.bam), and here is part of my bam at chr6: 29,941,260-29,949,572(which is the HLA-A location in GRCh38) :

ST-E00310:745:HYFGCCCXY:4:1106:21866:72772 99 chr6 29945490 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:1113:27154:20295 99 chr6 29945511 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:2117:27864:62101 99 chr6 29945518 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:1117:26788:62136 99 chr6 29945528 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:1113:27154:20295 147 chr6 29945551 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:2117:27864:62101 147 chr6 29945576 0 150M = 29 ST-E00310:745:HYFGCCCXY:4:1117:26788:62136 147 chr6 29945608 0 7S143M = 29 ST-E00310:745:HYFGCCCXY:4:1106:21866:72772 147 chr6 29945615 0 150M = 29

here is part of my bam at chr6: 31,259,328-31,347,305(which is the HLA-B location in GRCh38) ST-E00310:745:HYFGCCCXY:4:2116:32664:55790 403 chr6 31274087 0 22H44M84H ch ST-E00310:745:HYFGCCCXY:4:2115:14732:2997 99 chr6 31275608 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:2109:10155:46015 99 chr6 31275620 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:2105:2514:71242 163 chr6 31275627 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:1203:11332:17571 163 chr6 31275633 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:2113:24292:68676 99 chr6 31275690 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2113:24606:68939 1123 chr6 31275690 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2214:18883:54313 1187 chr6 31275702 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2214:20801:57284 163 chr6 31275702 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2113:24292:68676 147 chr6 31275731 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2113:24606:68939 1171 chr6 31275731 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2109:10155:46015 147 chr6 31275743 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:2115:14732:2997 147 chr6 31275744 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:1105:30340:63331 99 chr6 31275753 0 150M = 31 ST-E00310:745:HYFGCCCXY:4:2105:2514:71242 83 chr6 31275762 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:2112:27326:22634 163 chr6 31275773 0 150M = 31 ST-E00310:745:HYFGCCCXY:4:1203:11332:17571 83 chr6 31275778 25 150M = 31 ST-E00310:745:HYFGCCCXY:4:1106:20273:5704 1123 chr6 31275813 0 150M = 31 ST-E00310:745:HYFGCCCXY:4:2119:32248:23636 99 chr6 31275813 0 150M = 31 ST-E00310:745:HYFGCCCXY:4:2214:18883:54313 1107 chr6 31275817 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:2214:20801:57284 83 chr6 31275817 22 150M = 31 ST-E00310:745:HYFGCCCXY:4:1105:30340:63331 147 chr6 31275846 0 150M = 31 ST-E00310:745:HYFGCCCXY:4:2112:27326:22634 83 chr6 31275846 0 150M = 31 :

And I ran again but I still got the notification: lowering min base quality(30) to median(0)

I looked through the question other people had asked, it said that if I use the GRCh38 for mapping, I should use the non-altered. It's this what's the problem from and I have no idea where I can get the non-altered reference.

Thanks for your answer!

p-priestley commented 11 months ago

None of those reads overlap with a coding region of HLA-A,B or C so they would not be counted. If you are using a genome with HLA ALT contig, then that is the problem. As is noted on the readme:

LILAC supports GRCH37, hg19 and hg38 (no alt) ref genomes. Note that if the genome has been aligned to an hg38 genome with HLA alt contigs, that fragments from these contigs will need to be extracted and realigned to the no alt genome prior to running LILAC.

Here is an example implementation for realignment (note I have not tried to run this): https://github.com/scwatts/hla_realignment

charlesshale commented 11 months ago

HMF lists a v38 ref genome with no alts on the public resources page.