sgkit-dev / bio2zarr

Convert bioinformatics file formats to Zarr
Apache License 2.0
23 stars 5 forks source link

ICF stores created with numpy 1.x won't work with numpy 2.x #258

Open jeromekelleher opened 1 month ago

jeromekelleher commented 1 month ago

We get a backtrace like:

Traceback (most recent call last):                                                                                    
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,                                                                        
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code                                                           
    exec(code, run_globals)                                
  File "/home/jk/work/github/bio2zarr/bio2zarr/__main__.py", line 22, in <module>
    bio2zarr()                                             
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__                            
    return self.main(*args, **kwargs)
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main                                
    rv = self.invoke(ctx)                                  
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke                              
    return _process_result(sub_ctx.command.invoke(sub_ctx))                                                            
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke                              
    return _process_result(sub_ctx.command.invoke(sub_ctx))                                                            
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke                              
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jk/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke                               
    return __callback(*args, **kwargs)
  File "/home/jk/work/github/bio2zarr/bio2zarr/cli.py", line 366, in encode                                            
    vcf2zarr.encode(                                       
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 970, in encode                                   
    vzw.encode_all_partitions(
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 936, in encode_all_partitions                    
    pwm.submit(self.encode_partition, partition_index)
  File "/home/jk/work/github/bio2zarr/bio2zarr/core.py", line 291, in submit                                           
    future = self.executor.submit(*args, **kwargs)
  File "/home/jk/work/github/bio2zarr/bio2zarr/core.py", line 90, in submit                                            
    future.set_result(fn(*args, **kwargs))
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 643, in encode_partition                         
    self.encode_id_partition(partition_index)
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/vcz.py", line 749, in encode_id_partition                      
    for value in col.iter_values(partition.start, partition.stop):                                                     
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/icf.py", line 638, in iter_values                              
    chunk_record_index = self.chunk_record_index(start_partition)                                                      
  File "/home/jk/work/github/bio2zarr/bio2zarr/vcf2zarr/icf.py", line 601, in chunk_record_index                       
    a = pickle.load(f)                                     
ModuleNotFoundError: No module named 'numpy._core'

I'm not sure there's an awful lot we can do about this, unfortunately. Numpy 2.x and 1.x are going to coexist for some time, so it's not clear what we can do here.

The easy solution would be to pin to numpy > 2 and bump the ICF metadata version. However, pinning to numpy > 2 isn't helpful in the short term, I think.

jeromekelleher commented 1 month ago

On reflection, the reasonable thing to do here is to catch the pickle-time error and raise a helpful error message. We don't need to support mixed numpy version environments, as it's an intermediate format after all.