sgkit-dev / bio2zarr

Convert bioinformatics file formats to Zarr
Apache License 2.0
26 stars 7 forks source link

ValueError: could not broadcast input array #251

Closed tomwhite closed 3 months ago

tomwhite commented 3 months ago

This is when converting the following VCF (generated by hypothesis, see #249):

##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##source=sgkit-vcf-hypothesis-0.8.1.dev7+g450596f0
##contig=<ID=0>
##INFO=<ID=A0,Type=Integer,Number=A,Description="INFO,Type=Integer,Number=A">
##FORMAT=<ID=B0,Type=Character,Number=.,Description="FORMAT,Type=Character,Number=.">
##FORMAT=<ID=A0,Type=Integer,Number=1,Description="FORMAT,Type=Integer,Number=1">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  0   1
0   1   .   A   .   1.0 .   .   B0:A0   .:0 0,0,7:.
```shell $ vcf2zarr convert sample-fail.vcf.gz sample-fail.zarr concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 651, in encode_partition self.encode_array_partition(col, partition_index) File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 691, in encode_array_partition sanitiser(ba.buff, j, value) File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/icf.py", line 397, in sanitise_value_string_2d buff[j, k, : len(val)] = val ValueError: could not broadcast input array from shape (3,) into shape (2,) """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/tom/miniconda3/envs/bio2zarr-3.10/bin/vcf2zarr", line 8, in sys.exit(vcf2zarr_main()) File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/tom/miniconda3/envs/bio2zarr-3.10/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/Users/tom/workspace/bio2zarr/bio2zarr/cli.py", line 487, in convert_vcf vcf2zarr.convert( File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 1054, in convert encode( File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 974, in encode vzw.encode_all_partitions( File "/Users/tom/workspace/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 938, in encode_all_partitions with core.ParallelWorkManager(num_workers, progress_config) as pwm: File "/Users/tom/workspace/bio2zarr/bio2zarr/core.py", line 301, in __exit__ wait_on_futures(self.futures) File "/Users/tom/workspace/bio2zarr/bio2zarr/core.py", line 104, in wait_on_futures raise exception ValueError: could not broadcast input array from shape (3,) into shape (2,) ```

It works if field B0 is changed to Integer, so it's something to do with the code path for Character.

sample-fail.vcf.gz