hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
189 stars 58 forks source link

Orange - problem with flagstat file #422

Closed kstawiski closed 1 year ago

kstawiski commented 1 year ago

Hi! Thank you for making those great tools!

I'm having problem with Orange tool. I recive:

Exception in thread "main" java.io.IOException: Unable to parse flagstat file correctly
        at com.hartwig.hmftools.common.flagstat.FlagstatFile.read(FlagstatFile.java:41)
        at com.hartwig.hmftools.orange.algo.OrangeAlgo.loadSampleData(OrangeAlgo.java:314)
        at com.hartwig.hmftools.orange.algo.OrangeAlgo.run(OrangeAlgo.java:173)
        at com.hartwig.hmftools.orange.OrangeApplication.run(OrangeApplication.java:52)
        at com.hartwig.hmftools.orange.OrangeApplication.main(OrangeApplication.java:39)

I created the flagstat file using:

samtools flagstat -@ "$cpu_count" "${home_dir}/results/${sample}/${sample}.bam" > "${home_dir}/results/${sample}/${sample}.flagstat"

and my files look like this:

(base) xxx@xxx-worker:/home/kgs24/xxxx/Pipeline_v3_HMF# cat "${home_dir}/results/${sample}/${sample}.flagstat"
317332306 + 0 in total (QC-passed reads + QC-failed reads)
316502156 + 0 primary
830150 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
315752381 + 0 mapped (99.50% : N/A)
314922231 + 0 primary mapped (99.50% : N/A)
316502156 + 0 paired in sequencing
158251078 + 0 read1
158251078 + 0 read2
312049984 + 0 properly paired (98.59% : N/A)
314044616 + 0 with itself and mate mapped
877615 + 0 singletons (0.28% : N/A)
1141182 + 0 with mate mapped to a different chr
826583 + 0 with mate mapped to a different chr (mapQ>=5)

I reviewed the code and can't figure out what seems to be a problem. I deeply appreciate your help.

My full command line is:

java -jar ${home_dir}/tools_dir/orange-2.5.0-jar-with-dependencies.jar -primary_tumor_doids "4007" -tumor_sample_flagstat_file "${home_dir}/results/${sample}/${sample}.flagstat" -annotated_virus_tsv "${home_dir}/results/${sample}/virus-interpreter/${sample}.virus.annotated.tsv" -chord_prediction_txt "${home_dir}/results/${sample}/chord_pred.txt" -cohort_mapping_tsv "${home_dir}/ref_data_dir/cohort_mapping.tsv" -cohort_percentiles_tsv "${home_dir}/ref_data_dir/cohort_percentiles.tsv" -cuppa_result_csv "${home_dir}/results/${sample}/cuppa/${sample}.cup.data.csv" -doid_json "${home_dir}/ref_data_dir/doid.json" -driver_gene_panel_tsv "${home_dir}/ref_data_dir/common/DriverGenePanel.38.tsv" -known_fusion_file "${home_dir}/ref_data_dir/sv/known_fusion_data.38.csv" -lilac_qc_csv "${home_dir}/results/${sample}/lilac/${sample}.lilac.qc.csv" -lilac_result_csv "${home_dir}/results/${sample}/lilac/${sample}.lilac.csv" -linx_somatic_data_directory "${home_dir}/results/${sample}/linx" -output_dir "${home_dir}/results/${sample}/orange/" -purple_data_directory "${home_dir}/results/${sample}/purple" -ref_genome_version 38 -sage_germline_gene_coverage_tsv "${home_dir}/results/${sample}/sage/${sample}.sage.gene.coverage.tsv" -tumor_sample_id "${sample}" -tumor_sample_wgs_metrics_file "${home_dir}/results/${sample}/${sample}.bam_metrics.csv" -purple_plot_directory "${home_dir}/results/${sample}/purple/plot" -linx_plot_directory "${home_dir}/results/${sample}/linx/plot_data" -ensembl_data_directory "${home_dir}/ref_data_dir/common/ensembl_data" -sage_somatic_tumor_sample_bqr_plot "" 

Thanks! Konrad

kduyvesteyn commented 1 year ago

The ingestion fails on the lines containing "primary". If you remove the following lines, ORANGE should run correctly:

Could you let me know what version of samtools you use in the above command? We will make sure we add support for these flagstat versions then. The flagstats we currently expect are supposed to be generated by sambamba (v0.6.8)

kstawiski commented 1 year ago

Thanks! I generated flagstat using sambamba, and it works now.

My samtools version:

root@xxx# samtools --version
samtools 1.17
Using htslib 1.17
Copyright (C) 2023 Genome Research Ltd.

Samtools compilation details:
    Features:       build=Makefile curses=yes 
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -g -Wall -O2
    LDFLAGS:        
    HTSDIR:         htslib-1.17
    LIBS:           
    CURSES_LIB:     -lcurses

HTSlib compilation details:
    Features:       build=Makefile libcurl=yes S3=no GCS=no libdeflate=no lzma=yes bzip2=yes plugins=no htscodecs=1.4.0
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -g -Wall -O2 -fvisibility=hidden
    LDFLAGS:        -fvisibility=hidden

HTSlib URL scheme handlers present:
    built-in:    preload, data, file
    libcurl:     imaps, pop3, http, smb, gopher, sftp, ftps, imap, smtp, smtps, rtsp, scp, ftp, telnet, rtmp, ldap, https, ldaps, smbs, tftp, pop3s, dict
    crypt4gh-needed:     crypt4gh
    mem:         mem

Thanks once again for the magnificent tools! Konrad