hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
181 stars 56 forks source link

[ERROR] cannot find sequence index for chromosome 6 in bam header #392

Closed rod1Gene closed 1 year ago

rod1Gene commented 1 year ago

I'm getting an error when trying the simplest form of the command

java -jar ~/bin/jars/lilac_v1.4.2.jar -sample ID -ref_genome ~/data/references/GRCh38_full_analysis_set_plus_decoy_hla_chr6.fa -resource_dir /home/rod1/data/references/lilac_definition_files -reference_bam /scratch/u/rod1/ID_chr6.bam -output_dir /scratch/u/rod1

Using this reference: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

And i'm getting these messages (which are a subset of the logging messages)

[INFO ] querying records from reference bam /scratch/u/rod1/ID_chr6.bam
[ERROR] cannot find sequence index for chromosome 6 in bam header
[INFO ] lowering min base quality(30) to median(0)
[WARN ] gene depth coverage(A=1098 B=1089 C=1101) too low, exiting
[INFO ] writing failed-sample output to /scratch/u/rod1/

EP102018D03.lilac.csv is created, but only contains the header line. EP102018D03.lilac.qc.csv just contains one line besides the header FAIL,0.000,,0,NONE,0,0,0,1098,1089,1101,0,0,0,0,0,0,0,0,NaN,NaN,NaN,0,0,0,0,0,0

I'm using openjdk 17.0.4 and samtools 1.17 on CentOS 7.9. Here are the headers from the input bam

@HD VN:1.0  SO:unsorted
@SQ SN:chr6 LN:170805979
@PG ID:hisat2   PN:hisat2   VN:2.2.1    CL:"/home/rod1/bin/hisat2/hisat2-align-s --wrapper basic-0 -p 14 --very-sensitive -x /gstore/home/rod1/data/references/hisat2_indexes/GRCh38_full_analysis_set_plus_decoy_hla_chr6 -S ID_chr6.sam --read-lengths 151 -1 ID_1.fastq -2 ID_2.fastq"
@PG ID:samtools PN:samtools PP:hisat2   VN:1.17 CL:samtools view -b -S ID_chr6.sam
@PG ID:samtools.1   PN:samtools PP:samtools VN:1.17 CL:samtools view -H ID_chr6.bam

I'd appreciate help finding what I did wrong.

charlesshale commented 1 year ago

The command requires the reference genome version argument: -ref_genome_version V38

thanks.

rod1Gene commented 1 year ago

Thank you.

Although documented in the 'Mandatory Arguments' section of the README, I recommend adding it to the 'Example Usage' section too, as fewer and fewer people will be using the default of V37 and it will be harder for people (like me) to miss it.