kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
98 stars 12 forks source link

Error running 1.3.10 #36

Closed holtjma closed 2 years ago

holtjma commented 2 years ago

Hello,

I've been trying to run dysgu v1.3.10 using a conda approach (which is what I've done for previous versions). However, I keep encountering a strange error:

Activating conda environment: <redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435
2022-04-20 11:11:43,425 [INFO   ]  [dysgu-run] Version: 1.3.10
2022-04-20 11:11:43,429 [INFO   ]  run -o <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452.vcf --mode pacbio --pl pacbio --procs 16 --search <redacted>/CSL_pipeline_benchmark/data_files/hg38_contigs.bed.gz <redacted>/reference/hg38_T2T_masked/hg38.fa <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452_tmp <redacted>/CSL_pipeline_benchmark/pipeline/merged_alignments/hg38_T2T_masked/sentieon_mm2-202112.01/HALB3010452.bam
2022-04-20 11:11:43,430 [INFO   ]  Destination: <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452_tmp
2022-04-20 11:11:43,430 [INFO   ]  Searching regions from <redacted>/CSL_pipeline_benchmark/data_files/hg38_contigs.bed.gz
Traceback (most recent call last):
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/bin/dysgu", line 8, in <module>
    sys.exit(cli())
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/dysgu/main.py", line 270, in run_pipeline
    max_cov_value = sv2bam.process(ctx.obj)
  File "dysgu/sv2bam.pyx", line 184, in dysgu.sv2bam.process
  File "dysgu/sv2bam.pyx", line 58, in dysgu.sv2bam.parse_search_regions
  File "dysgu/sv2bam.pyx", line 59, in dysgu.sv2bam.parse_search_regions
  File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I'm not sure what's causing this, the BAM files have worked for all of my other tests. I am also happy to try the Docker if you'd prefer that, but I would need it versioned on docker hub (I only saw "latest" earlier).

Thanks!

kcleal commented 2 years ago

Hi @holtjma , I think it is because the input bed file is in gzip format, but dysgu expects a text format. I can add a gzip bed reader in the next release.

holtjma commented 2 years ago

Ah, that seems to have fixed the issue on my end. Feel free to close this when you're ready.

kcleal commented 2 years ago

Closing now as v1.3.11 now supports gzip'ed bed files