google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.19k stars 721 forks source link

alueError: Unknown: BED record has invalid number of fields #542

Closed amyhouseman closed 2 years ago

amyhouseman commented 2 years ago

Hello, Operatin system: Linux HPC Version: 1.3.0 Installation: Singularity Data: WES - with Agilent SureSelect DNA Human All ExonV5_hg38 bed file

Steps to reproduce: Command

`#!/bin/bash --login
#SBATCH -J AmyHouseman_deepvariant
#SBATCH -o %x.stdout.%J.%N
#SBATCH -e %x.stderr.%J.%N
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p compute
#SBATCH --account=scw1581
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=HousemanA@cardiff.ac.uk     # Where to send mail
#SBATCH --array=1-23
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=128GB

module purge
module load parallel
module load singularity
EXOME_IDs_FILE=Polyposis_Exome_Analysis_JOB27/fastp/All_fastp_input/IDswithoutR1R2_JOB27
HG38_REFERENCE=Polyposis_Exome_Analysis_JOB27/bwa/index/indexhumanrefseq_output/samtools_faidx/GRCh38_latest_genomic.fna
PICARDMARKDUPLICATES_SORTEDBAM=Polyposis_Exome_Analysis_JOB27/picard/markduplicate/markedduplicates/{}PE_markedduplicates.bam
BED_REGIONS=Polyposis_Exome_Analysis_JOB27/deepvariant/bed/AgilentSureSelectDNASureSelectXTHumanAllExonV5_hg38_recoded_nocol4.bed
OUTPUT_VCF=Polyposis_Exome_Analysis_JOB27/deepvariant/vcf/{}PE_output.vcf.gz
OUTPUT_GVCF=Polyposis_Exome_Analysis_JOB27/deepvariant/gvcf/{}PE_output.vcf.gz
INTERMEDIATE_RESULTS=Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/{}PE_output_intermediate

# Set bash error trapping to exit on first error.
set -eu

cd /scratch/c.c21087028/

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "singularity run -B /usr/lib/locale/:/usr/lib/locale/ containers/deepvariant_1.3.0.sif /opt/deepvariant/bin/run_deepvariant --model_type=WES \
--ref=$HG38_REFERENCE \
--reads=$PICARDMARKDUPLICATES_SORTEDBAM \
--regions=$BED_REGIONS \
--output_vcf=$OUTPUT_VCF \
--output_gvcf=$OUTPUT_GVCF \
--intermediate_results_dir=$INTERMEDIATE_RESULTS"

Error trace:

* Intermediate results will be written to Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/15M11163_L7_PE_output_intermediate in docker.

Running the command:

time seq 0 0 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "Polyposis_Exome_Analysis_JOB27/bwa/index/indexhumanrefseq_output/samtools_faidx/GRCh38_latest_genomic.fna" --reads "Polyposis_Exome_Analysis_JOB27/picard/markduplicate/markedduplicates/15M11163_L7_PE_markedduplicates.bam" --examples "Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/15M11163_L7_PE_output_intermediate/make_examples.tfrecord@1.gz" --gvcf "Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/15M11163_L7_PE_output_intermediate/gvcf.tfrecord@1.gz" --regions "Polyposis_Exome_Analysis_JOB27/deepvariant/bed/AgilentSureSelectDNASureSelectXTHumanAllExonV5_hg38_recoded_nocol4.bed" --task {}

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
I0614 20:20:45.812468 47288495204160 genomics_reader.py:222] Reading Polyposis_Exome_Analysis_JOB27/picard/markduplicate/markedduplicates/15M11163_L7_PE_markedduplicates.bam with NativeSamReader
W0614 20:20:45.812704 47288495204160 make_examples_core.py:276] No non-empty sample name found in the input reads. DeepVariant will use default as the sample name. You can also provide a sample name with the --sample_name argument.
I0614 20:20:45.822298 47288495204160 make_examples_core.py:239] Preparing inputs
I0614 20:20:45.836667 47288495204160 genomics_reader.py:222] Reading Polyposis_Exome_Analysis_JOB27/picard/markduplicate/markedduplicates/15M11163_L7_PE_markedduplicates.bam with NativeSamReader
I0614 20:20:46.057183 47288495204160 make_examples_core.py:239] Common contigs are ['NC_000001.11', 'NT_187361.1', 'NT_187362.1', 'NT_187363.1', 'NT_187364.1', 'NT_187365.1', 'NT_187366.1', 'NT_187367.1', 'NT_187368.1', 'NT_187369.1', 'NC_000002.12', 'NT_187370.1', 'NT_187371.1', 'NC_000003.12', 'NT_167215.1', 'NC_000004.12', 'NT_113793.3', 'NC_000005.10', 'NT_113948.1', 'NC_000006.12', 'NC_000007.14', 'NC_000008.11', 'NC_000009.12', 'NT_187372.1', 'NT_187373.1', 'NT_187374.1', 'NT_187375.1', 'NC_000010.11', 'NC_000011.10', 'NT_187376.1', 'NC_000012.12', 'NC_000013.11', 'NC_000014.9', 'NT_113796.3', 'NT_167219.1', 'NT_187377.1', 'NT_113888.1', 'NT_187378.1', 'NT_187379.1', 'NT_187380.1', 'NT_187381.1', 'NC_000015.10', 'NT_187382.1', 'NC_000016.10', 'NT_187383.1', 'NC_000017.11', 'NT_113930.2', 'NT_187384.1', 'NT_187385.1', 'NC_000018.10', 'NC_000019.10', 'NC_000020.11', 'NC_000021.9', 'NC_000022.11', 'NT_187386.1', 'NT_187387.1', 'NT_187388.1', 'NT_187389.1', 'NT_187390.1', 'NT_187391.1', 'NT_187392.1', 'NT_187393.1', 'NT_187394.1', 'NC_000023.11', 'NC_000024.10', 'NT_187395.1', 'NT_187396.1', 'NT_187397.1', 'NT_187398.1', 'NT_187399.1', 'NT_187400.1', 'NT_187401.1', 'NT_187402.1', 'NT_187403.1', 'NT_187404.1', 'NT_187405.1', 'NT_187406.1', 'NT_187407.1', 'NT_187408.1', 'NT_187409.1', 'NT_187410.1', 'NT_187411.1', 'NT_187412.1', 'NT_187413.1', 'NT_187414.1', 'NT_187415.1', 'NT_187416.1', 'NT_187417.1', 'NT_187418.1', 'NT_187419.1', 'NT_187420.1', 'NT_187421.1', 'NT_187422.1', 'NT_187423.1', 'NT_187424.1', 'NT_187425.1', 'NT_187426.1', 'NT_187427.1', 'NT_187428.1', 'NT_187429.1', 'NT_187430.1', 'NT_187431.1', 'NT_187432.1', 'NT_187433.1', 'NT_187434.1', 'NT_187435.1', 'NT_187436.1', 'NT_187437.1', 'NT_187438.1', 'NT_187439.1', 'NT_187440.1', 'NT_187441.1', 'NT_187442.1', 'NT_187443.1', 'NT_187444.1', 'NT_187445.1', 'NT_187446.1', 'NT_187447.1', 'NT_187448.1', 'NT_187449.1', 'NT_187450.1', 'NT_187451.1', 'NT_187452.1', 'NT_187453.1', 'NT_187454.1', 'NT_187455.1', 'NT_187456.1', 'NT_187457.1', 'NT_187458.1', 'NT_187459.1', 'NT_187460.1', 'NT_187461.1', 'NT_187462.1', 'NT_187463.1', 'NT_187464.1', 'NT_187465.1', 'NT_187466.1', 'NT_187467.1', 'NT_187468.1', 'NT_187469.1', 'NT_187470.1', 'NT_187471.1', 'NT_187472.1', 'NT_187473.1', 'NT_187474.1', 'NT_187475.1', 'NT_187476.1', 'NT_187477.1', 'NT_187478.1', 'NT_187479.1', 'NT_187480.1', 'NT_187481.1', 'NT_187482.1', 'NT_187483.1', 'NT_187484.1', 'NT_187485.1', 'NT_187486.1', 'NT_187487.1', 'NT_187488.1', 'NT_187489.1', 'NT_187490.1', 'NT_187491.1', 'NT_187492.1', 'NT_187493.1', 'NT_187494.1', 'NT_187495.1', 'NT_187496.1', 'NT_113901.1', 'NT_167213.1', 'NT_167214.1', 'NT_167218.1', 'NT_187497.1', 'NT_167220.1', 'NT_167208.1', 'NT_187498.1', 'NT_187499.1', 'NT_187500.1', 'NT_187501.1', 'NT_187502.1', 'NT_187503.1', 'NT_187504.1', 'NT_187505.1', 'NT_187506.1', 'NT_187508.1', 'NT_187509.1', 'NT_187510.1', 'NT_187511.1', 'NT_187512.1', 'NT_167209.1', 'NT_187513.1', 'NT_167211.2', 'NT_113889.1', 'NW_012132914.1', 'NW_015495298.1', 'NW_011332688.1', 'NW_014040926.1', 'NW_009646195.1', 'NW_018654706.1', 'NW_019805487.1', 'NW_009646194.1', 'NW_018654707.1', 'NW_014040925.1', 'NW_017852928.1', 'NW_009646196.1', 'NW_011332687.1', 'NW_018654708.1', 'NW_014040927.1', 'NW_021159988.1', 'NW_012132915.1', 'NW_018654709.1', 'NW_015495299.1', 'NW_018654710.1', 'NW_011332690.1', 'NW_021159987.1', 'NW_011332689.1', 'NW_017363813.1', 'NW_009646197.1', 'NW_012132916.1', 'NW_011332691.1', 'NW_021159989.1', 'NW_018654711.1', 'NW_012132917.1', 'NW_009646198.1', 'NW_019805491.1', 'NW_019805492.1', 'NW_019805490.1', 'NW_019805489.1', 'NW_019805488.1', 'NW_021159990.1', 'NW_021159993.1', 'NW_013171799.1', 'NW_021159992.1', 'NW_021159994.1', 'NW_021159991.1', 'NW_021159995.1', 'NW_013171800.1', 'NW_013171801.1', 'NW_017363814.1', 'NW_015495300.1', 'NW_015495301.1', 'NW_018654712.1', 'NW_009646199.1', 'NW_016107297.1', 'NW_021159996.1', 'NW_016107298.1', 'NW_018654713.1', 'NW_013171803.1', 'NW_012132918.1', 'NW_009646200.1', 'NW_013171802.1', 'NW_017363815.1', 'NW_021159997.1', 'NW_021159998.1', 'NW_019805493.1', 'NW_017852929.1', 'NW_017852930.1', 'NW_018654714.1', 'NW_018654715.1', 'NW_012132919.1', 'NW_018654717.1', 'NW_017852932.1', 'NW_017852931.1', 'NW_019805494.1', 'NW_018654716.1', 'NW_013171804.1', 'NW_013171805.1', 'NW_009646201.1', 'NW_021159999.1', 'NW_021160000.1', 'NW_011332694.1', 'NW_021160001.1', 'NW_013171806.1', 'NW_009646202.1', 'NW_013171807.1', 'NW_011332693.1', 'NW_011332692.1', 'NW_015148966.1', 'NW_021160004.1', 'NW_011332695.1', 'NW_021160006.1', 'NW_019805496.1', 'NW_019805495.1', 'NW_017363816.1', 'NW_019805498.1', 'NW_021160005.1', 'NW_021160003.1', 'NW_019805497.1', 'NW_013171808.1', 'NW_021160002.1', 'NW_009646203.1', 'NW_013171809.1', 'NW_018654718.1', 'NW_021160008.1', 'NW_011332696.1', 'NW_009646204.1', 'NW_018654720.1', 'NW_015148967.1', 'NW_018654719.1', 'NW_011332697.1', 'NW_019805499.1', 'NW_021160007.1', 'NW_021160012.1', 'NW_011332699.1', 'NW_013171810.1', 'NW_021160009.1', 'NW_009646205.1', 'NW_011332700.1', 'NW_013171811.1', 'NW_021160010.1', 'NW_021160011.1', 'NW_011332698.1', 'NW_021160013.1', 'NW_018654722.1', 'NW_021160014.1', 'NW_018654721.1', 'NW_011332701.1', 'NW_021160018.1', 'NW_021160017.1', 'NW_012132920.1', 'NW_021160016.1', 'NW_021160015.1', 'NW_013171812.1', 'NW_019805500.1', 'NW_017852933.1', 'NW_021160019.1', 'NW_013171813.1', 'NW_018654723.1', 'NW_012132921.1', 'NW_017363817.1', 'NW_021160020.1', 'NW_016107299.1', 'NW_017363819.1', 'NW_017363818.1', 'NW_019805501.1', 'NW_021160021.1', 'NW_019805503.1', 'NW_014040928.1', 'NW_019805502.1', 'NW_013171814.1', 'NW_018654724.1', 'NW_021160022.1', 'NW_014040929.1', 'NW_009646206.1', 'NW_016107300.1', 'NW_016107301.1', 'NW_016107302.1', 'NW_016107303.1', 'NW_016107304.1', 'NW_016107305.1', 'NW_016107306.1', 'NW_016107307.1', 'NW_016107308.1', 'NW_016107309.1', 'NW_016107310.1', 'NW_016107311.1', 'NW_016107313.1', 'NW_016107314.1', 'NW_016107312.1', 'NW_021160023.1', 'NW_021160026.1', 'NW_021160024.1', 'NW_009646207.1', 'NW_014040930.1', 'NW_014040931.1', 'NW_009646208.1', 'NW_015148968.1', 'NW_021160025.1', 'NW_015148969.1', 'NW_017363820.1', 'NW_021160031.1', 'NW_021160028.1', 'NW_021160029.1', 'NW_021160027.1', 'NW_021160030.1', 'NW_018654725.1', 'NW_018654726.1', 'NW_009646209.1', 'NT_187515.1', 'NT_187517.1', 'NT_187514.1', 'NT_187520.1', 'NW_003315905.1', 'NW_003315906.1', 'NW_003315907.2', 'NT_187521.1', 'NT_187519.1', 'NT_187516.1', 'NT_187518.1', 'NT_187525.1', 'NT_187526.1', 'NT_187529.1', 'NT_187522.1', 'NW_003315908.1', 'NT_187524.1', 'NT_187531.1', 'NT_187530.1', 'NT_187528.1', 'NW_003571033.2', 'NW_003315909.1', 'NT_187527.1', 'NT_187523.1', 'NW_003871060.2', 'NT_187535.1', 'NT_187537.1', 'NW_003315913.1', 'NT_187533.1', 'NT_187536.1', 'NT_187538.1', 'NT_187532.1', 'NT_187534.1', 'NT_187539.1', 'NT_187540.1', 'NW_003315915.1', 'NT_187541.1', 'NT_167250.2', 'NT_187544.1', 'NW_003315914.1', 'NT_187542.1', 'NT_187545.1', 'NT_187543.1', 'NT_187550.1', 'NT_187548.1', 'NT_187547.1', 'NW_003315920.1', 'NW_003571036.1', 'NT_187551.1', 'NW_003315917.2', 'NW_003315918.1', 'NT_187549.1', 'NW_003315919.1', 'NT_187546.1', 'NT_167244.2', 'NT_187555.1', 'NT_187554.1', 'NW_003315921.1', 'NT_187556.1', 'NT_187557.1', 'NW_004166862.2', 'NT_187552.1', 'NT_187553.1', 'NT_187558.1', 'NT_187561.1', 'NT_187559.1', 'NW_003315922.2', 'NT_187562.1', 'NT_187564.1', 'NT_187563.1', 'NT_187560.1', 'NT_187572.1', 'NT_187568.1', 'NT_187565.1', 'NT_187576.1', 'NT_187570.1', 'NT_187577.1', 'NT_187566.1', 'NT_187567.1', 'NT_187574.1', 'NT_187575.1', 'NT_187573.1', 'NT_187571.1', 'NT_187569.1', 'NW_003315928.1', 'NW_003315929.1', 'NW_003315930.1', 'NW_003315931.1', 'NT_187578.1', 'NW_003315934.1', 'NT_187579.1', 'NW_003315935.1', 'NT_187580.1', 'NT_187586.1', 'NT_187584.1', 'NT_187585.1', 'NT_187583.1', 'NW_003315936.1', 'NW_003871073.1', 'NW_003871074.1', 'NT_187582.1', 'NT_187581.1', 'NW_003571049.1', 'NW_003571050.1', 'NT_187588.1', 'NW_003315938.1', 'NT_187587.1', 'NW_003315939.2', 'NW_003315941.1', 'NW_003315942.2', 'NT_187590.1', 'NW_003315940.1', 'NT_187589.1', 'NT_187591.1', 'NT_187594.1', 'NT_187593.1', 'NT_187597.1', 'NT_187595.1', 'NT_187592.1', 'NT_187596.1', 'NT_187598.1', 'NT_187601.1', 'NT_187599.1', 'NT_187600.1', 'NT_187602.1', 'NT_187604.1', 'NT_187603.1', 'NW_003315943.1', 'NT_187605.1', 'NW_003315944.2', 'NT_187606.1', 'NT_187610.1', 'NT_187609.1', 'NT_187608.1', 'NT_187607.1', 'NW_003315945.1', 'NW_003315946.1', 'NW_003315952.3', 'NT_187613.1', 'NT_187611.1', 'NT_187614.1', 'NW_003871091.1', 'NW_003871092.1', 'NW_003315953.2', 'NT_167251.2', 'NW_003315954.1', 'NT_187615.1', 'NT_187616.1', 'NW_003315955.1', 'NT_187612.1', 'NT_187618.1', 'NW_003315956.1', 'NW_003315959.1', 'NW_003315960.1', 'NW_003315957.1', 'NW_003315958.1', 'NW_003315961.1', 'NT_187617.1', 'NT_187622.1', 'NT_187621.1', 'NW_003315962.1', 'NW_003315964.2', 'NW_003315965.1', 'NW_003315963.1', 'NT_187619.1', 'NT_187620.1', 'NW_003571054.1', 'NW_003315966.2', 'NT_187623.1', 'NT_187625.1', 'NT_187624.1', 'NW_003315967.2', 'NT_187628.1', 'NT_187627.1', 'NW_003315968.2', 'NW_003315969.2', 'NW_003315970.2', 'NT_187626.1', 'NT_187629.1', 'NT_187632.1', 'NT_187633.1', 'NT_187630.1', 'NT_187631.1', 'NW_003315972.2', 'NW_003315971.2', 'NT_187634.1', 'NT_187635.1', 'NT_187646.1', 'NT_187648.1', 'NT_187647.1', 'NT_187649.1', 'NT_187650.1', 'NT_187651.1', 'NT_187652.1', 'NT_113891.3', 'NT_187653.1', 'NT_187655.1', 'NT_187654.1', 'NT_187656.1', 'NT_187657.1', 'NT_187658.1', 'NT_187659.1', 'NT_187660.1', 'NT_187662.1', 'NT_187664.1', 'NT_187661.1', 'NW_003871093.1', 'NT_187663.1', 'NT_187665.1', 'NT_187666.1', 'NW_003571055.2', 'NW_004504305.1', 'NT_187667.1', 'NT_187678.1', 'NT_187679.1', 'NT_167245.2', 'NT_187680.1', 'NT_187681.1', 'NW_003571056.2', 'NT_187682.1', 'NT_187688.1', 'NT_167246.2', 'NW_003571057.2', 'NT_187689.1', 'NT_167247.2', 'NW_003571058.2', 'NT_187690.1', 'NT_167248.2', 'NW_003571059.2', 'NT_187691.1', 'NT_167249.2', 'NW_003571060.1', 'NT_187692.1', 'NW_003571061.2', 'NT_187693.1', 'NT_187636.1', 'NT_187637.1', 'NT_187638.1', 'NT_187639.1', 'NT_187640.1', 'NT_187641.1', 'NT_187642.1', 'NT_187643.1', 'NT_187644.1', 'NT_187645.1', 'NT_187668.1', 'NT_187669.1', 'NT_187670.1', 'NT_187671.1', 'NT_187672.1', 'NT_187673.1', 'NT_187674.1', 'NT_187675.1', 'NT_187676.1', 'NT_187677.1', 'NT_187683.1', 'NT_187684.1', 'NT_187685.1', 'NT_187686.1', 'NT_187687.1', 'NT_113949.2', 'NC_012920.1']
I0614 20:20:46.158432 47288495204160 genomics_reader.py:222] Reading Polyposis_Exome_Analysis_JOB27/deepvariant/bed/AgilentSureSelectDNASureSelectXTHumanAllExonV5_hg38_recoded_nocol4.bed with NativeBedReader
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 180, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 170, in main
    make_examples_core.make_examples_runner(options)
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1626, in make_examples_runner
    regions = processing_regions_from_options(options)
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1466, in processing_regions_from_options
    calling_regions = build_calling_regions(ref_contigs, options.calling_regions,
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 468, in build_calling_regions
    ranges.RangeSet.from_regions(regions_to_include, contig_dict))
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 161, in from_regions
    return cls(ranges=from_regions(regions, contig_map=contig_map))
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 113, in __init__
    for i, range_ in enumerate(ranges):
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 493, in from_regions
    for elt in reader(region):
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 459, in bed_parser
    for r in fin.iterate():
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/io/clif_postproc.py", line 82, in __next__
    record, not_done = self._raw_next()
  File "/tmp/Bazel.runfiles_pz6djil_/runfiles/com_google_deepvariant/third_party/nucleus/io/clif_postproc.py", line 102, in _raw_next
    not_done = self._cc_iterable.PythonNext(record)
ValueError: Unknown: BED record has invalid number of fields
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref Polyposis_Exome_Analysis_JOB27/bwa/index/indexhumanrefseq_output/samtools_faidx/GRCh38_latest_genomic.fna --reads Polyposis_Exome_Analysis_JOB27/picard/markduplicate/markedduplicates/15M11163_L7_PE_markedduplicates.bam --examples Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/15M11163_L7_PE_output_intermediate/make_examples.tfrecord@1.gz --gvcf Polyposis_Exome_Analysis_JOB27/deepvariant/intermediateresults/15M11163_L7_PE_output_intermediate/gvcf.tfrecord@1.gz --regions Polyposis_Exome_Analysis_JOB27/deepvariant/bed/AgilentSureSelectDNASureSelectXTHumanAllExonV5_hg38_recoded_nocol4.bed --task 0

real    0m3.367s
user    0m2.683s
sys 0m0.545s

First lines: First 10 lines of sorted marked duplicate bam is: BAM?P@HD VN:1.6 SO:coordinate @SQ SN:NC_000001.11 LN:248956422 @SQ SN:NT_187361.1 LN:175055 @SQ SN:NT_187362.1 LN:32032 @SQ SN:NT_187363.1 LN:127682 @SQ SN:NT_187364.1 LN:66860 @SQ SN:NT_187365.1 LN:40176 @SQ SN:NT_187366.1 LN:42210 @SQ SN:NT_187367.1 LN:176043 @SQ SN:NT_187368.1 LN:40745

First line of reference hg38 is:

NC_000001.11 Homo sapiens chromosome 1, GRCh38.p13 Primary Assembly

First line of bed file is: NC_000001.11 65509 65625

I have got deepvariant and the above code to work for another dataset with a different bed file used - but I'm not sure why the ValueError: Unknown: BED record has invalid number of fields error is occurring.

Thanks! Amy

akolesnikov commented 2 years ago

Could you please paste a line from the input BAM after the header? Also, a line from the reference showing the ID of the chromosome.

amyhouseman commented 2 years ago

Lines after header in input bam:

Image 15-06-2022 at 09 41

Chromosome ID in reference (it isn't all Ns, I checked):

Screenshot 2022-06-15 at 09 49 45
amyhouseman commented 2 years ago

I think it may have been a problem with my bed file!

I did this to see if it was tab or spaces between the fields (even though my text editor said it was tabs): tr -d " " < original.bed > checktabspaces.bed

The result was a file where all the columns combined so then I did this on the original file: awk 'OFS=" " {print $1"\t", $2"\t", $3}' orginial.bed | tr -d " " > tab.bed

And then reran, and it worked!

Not a deepvariant problem, but thought I'd share. Thanks! Amy