related-sciences / ukb-gwas-pipeline-nealelab

Pipeline for reproduction of NealeLab 2018 UKB GWAS
4 stars 3 forks source link

Upgrade to zarr 2.5.0 #23

Closed eric-czech closed 3 years ago

eric-czech commented 4 years ago

After upgrading to zarr 2.5.0 in https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/blob/master/envs/gwas.yaml, I'm getting errors like this:

ds['some_var'].values
Traceback (most recent call last):
  File "scripts/gwas.py", line 374, in <module>
    fire.Fire()
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas.py", line 319, in run_gwas
    ds = load_gwas_ds(genotypes_path, phenotypes_path, dictionary_path)
  File "scripts/gwas.py", line 270, in load_gwas_ds
    ds = add_covariates(ds)
  File "scripts/gwas.py", line 152, in add_covariates
    print(ds['sample_genetic_sex'].values)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/xarray/core/common.py", line 228, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'values'

or:

ds['some_var'].compute()

2020-11-20 13:53:59,231 | __main__ | INFO | Initialized script with dask client:
<Client: 'tcp://10.142.0.47:8786' processes=20 threads=160, memory=1.10 TB>
2020-11-20 13:53:59,231 | __main__ | INFO | Running GWAS (genotypes_path=gs://rs-ukb/pipe/nealelab-gwas-uni-ancestry-v3/input/gt-imputation/ukb_chrXY.zarr, phenotypes_path=gs://rs-ukb/prep/main/ukb_phesant_phenotypes-subset01.csv, dictionary_path=gs://rs-ukb/prep/main/meta/data_dictionary_showcase.csv, output_path=gs://rs-ukb/pipe/nealelab-gwas-uni-ancestry-v3/output/gt-imputation/ukb_chrXY)
<xarray.DataArray 'sample_genetic_sex' (samples: 365758)>
dask.array<zarr, shape=(365758,), dtype=float64, chunksize=(5792,), chunktype=numpy.ndarray>
Dimensions without coordinates: samples
dask.array<zarr, shape=(365758,), dtype=float64, chunksize=(5792,), chunktype=numpy.ndarray>
Traceback (most recent call last):
  File "scripts/gwas.py", line 374, in <module>
    fire.Fire()
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas.py", line 319, in run_gwas
    ds = load_gwas_ds(genotypes_path, phenotypes_path, dictionary_path)
  File "scripts/gwas.py", line 270, in load_gwas_ds
    ds = add_covariates(ds)
  File "scripts/gwas.py", line 152, in add_covariates
    print(ds['sample_genetic_sex'].compute())
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/xarray/core/dataarray.py", line 834, in compute
    return new.load(**kwargs)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/xarray/core/dataarray.py", line 808, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/xarray/core/dataset.py", line 654, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/client.py", line 1986, in gather
    return self.sync(
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/client.py", line 832, in sync
    return sync(
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
    raise exc.with_traceback(tb)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
    result[0] = yield future
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/094ea43c/lib/python3.8/site-packages/distributed/client.py", line 1851, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.8/site-packages/dask/array/core.py", line 102, in getter
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/core/indexing.py", line 495, in __array__
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/core/indexing.py", line 560, in __array__
  File "/opt/conda/lib/python3.8/site-packages/xarray/backends/zarr.py", line 56, in __getitem__
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 571, in __getitem__
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 696, in get_basic_selection
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 739, in _get_basic_selection_nd
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 1033, in _get_selection
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 1666, in _chunk_getitems
  File "/opt/conda/lib/python3.8/site-packages/fsspec/mapping.py", line 92, in getitems
  File "/opt/conda/lib/python3.8/site-packages/fsspec/mapping.py", line 93, in <dictcomp>
AttributeError: 'FSMap' object has no attribute 'missing_exceptions'

There must be some incompatibility between xarray 0.16.1 and zarr 2.5.0 -- need to investigate further before upgrading to get async cloud read support.

eric-czech commented 3 years ago

This is gone with zarr 2.6.1 and xarray 0.16.2