google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.25k stars 728 forks source link

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/wire_format_lite.cc:584] #870

Closed alisamatisse closed 3 months ago

alisamatisse commented 3 months ago

Hi, so sorry for asking something again. But I really want to use mostly DeepVariant for variant calling.

Setup

My code:

cd path/to/deepvariant

BAM_DIR=. VCF_DIR=deepvariant_output/ REFERENCE=Reference_HLA/human_g1k_v37_decoy.fasta

export SINGULARITY_CACHEDIR="path/to/deepvariant/.singularity-$(whoami)" export SINGULARITY_TMPDIR="path/to/deepvariant/.singularity-$(whoami)"

BIN_VERSION="1.6.1"

for BAM_FILE in "${BAM_DIR}"/*.bam; do

Extract the base name of the BAM file (without the directory and extension)

BASE_NAME=$(basename "${BAM_FILE}" .bam)

# Define the output VCF file name
VCF_FILE="${VCF_DIR}/${BASE_NAME}.vcf.gz"
echo $BAM_FILE
echo $VCF_FILE
singularity exec --bind /usr/lib/locale/ \
docker://google/deepvariant:${BIN_VERSION} \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type WES \
    --ref $REFERENCE \
    --reads $BAM_FILE \
    --regions 6:32509320-32669663 \
    --output_vcf $VCF_FILE \
    --num_shards 12

done


  - Error trace: 

Running the command: time seq 0 11 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "Reference_HLA/chr6_hg19.fa" --reads "./MDC05_1463_3.final.bam" --examples "/tmp/7361351.1.gpu.q/tmpzsp9g_vq/make_examples.tfrecord@12.gz" --channels "insert_size" --regions "chr6:32509320-32669663" --task {}

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/wire_format_lite.cc:584] String field 'nucleus.genomics.v1.Program.command_line' contains invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. [libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/wire_format_lite.cc:584] String field 'nucleus.genomics.v1.Program.command_line' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. Traceback (most recent call last): File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 234, in app.run(main) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/absl_py/absl/app.py", line 312, in run _run_main(main, args) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/absl_py/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 220, in main options = default_options(add_flags=True, flags_obj=FLAGS) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 157, in default_options samples_in_order, sample_role_to_train = one_sample_from_flags( File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 109, in one_sample_from_flags sample_name = make_examples_core.assign_sample_name( File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 170, in assign_sample_name with sam.SamReader(reads_filenames.split(',')[0]) as sam_reader: File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/third_party/nucleus/io/genomics_reader.py", line 221, in init self._reader = self._native_reader(input_path, kwargs) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/third_party/nucleus/io/sam.py", line 260, in _native_reader return NativeSamReader(input_path, kwargs) File "/tmp/7361351.1.gpu.q/Bazel.runfiles_ii4x9mqm/runfiles/com_google_deepvariant/third_party/nucleus/io/sam.py", line 240, in init self.header = self._reader.header google.protobuf.message.DecodeError: Error parsing message



I ran the WES example from you with no problem, but I experience issues with my own data (I have the same setup when running singularity). 
I checked the reference and input bam files, they don't seem to be corrupted... but just googling the error did not help much. Cannot think of anything else, maybe you have some suggestions where the problem could be coming from? Otherwise I will try to find the raw data for my .bam files and do remapping to hg38 and use the ref that worked previously. 

Thanks again!!
Alisa
lucasbrambrink commented 3 months ago

Hi Alisa,

Happy to help! From the error logs, it looks like DeepVariant is unable to parse the header of the .bam file. Would it be possible for you to share this bam file (or a small slice of it including the header) with us so we can take a closer look? Thanks!

alisamatisse commented 3 months ago

Hi Lucas,

I appreciate your help, thanks so much. I did not know where to look.. and yes, it was the header!! My .bam file header had some german umlauts, haha. Fixed now :)

Have a great day!!!