malariagen / datalab

Repo for files and issues related to cloud deployment of JupyterHub.
MIT License
0 stars 1 forks source link

HTTPError using gcsfs #72

Closed leehart closed 4 years ago

leehart commented 4 years ago

We're getting errors from code that used to work prior to Datalab's upgrade to MalariaGEN Binder 2.3.0

I don't think the Zarr version has changed, so I expect it's the gcsfs upgrade (from 0.3.0 to 0.3.1).

Example code, not self-contained:

gcs =  gcsfs.GCSFileSystem(project='malariagen-jupyterhub', token=gcs_orig.session.credentials, cache_timeout=0)
genomic_positions_cloud_zarr = Path('vo_agam_production/resources/observatory/ag.allsites.nonN.zarr')
gcsmap = gcs.get_mapper(genomic_positions_cloud_zarr.as_posix())
genomic_positions_data = zarr.Group(gcsmap, read_only=True)
genomic_positions_data
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in __getitem__(self, key, default)
     75         try:
---> 76             result = self.fs.cat(key)
     77         except:

</opt/conda/lib/python3.6/site-packages/decorator.py:decorator-gen-138> in cat(self, path)

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in _tracemethod(f, self, *args, **kwargs)
     53 
---> 54     return f(self, *args, **kwargs)
     55 

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in cat(self, path)
    745         r = self.session.get(u2)
--> 746         r.raise_for_status()
    747         if 'X-Goog-Hash' in r.headers:

/opt/conda/lib/python3.6/site-packages/requests/models.py in raise_for_status(self)
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 

HTTPError: 503 Server Error: Service Unavailable for url: https://www.googleapis.com/download/storage/v1/b/vo_agam_production/o/resources%2Fobservatory%2Fag.allsites.nonN.zarr%2F.zgroup?alt=media

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer)
    109             mkey = self._key_prefix + group_meta_key
--> 110             meta_bytes = store[mkey]
    111         except KeyError:

/opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in __getitem__(self, key, default)
     79                 return default
---> 80             raise KeyError(key)
     81         return result

KeyError: 'vo_agam_production/resources/observatory/ag.allsites.nonN.zarr/.zgroup'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-16-cc7b7d343010> in <module>
      1 gcsmap = gcs.get_mapper(genomic_positions_cloud_zarr.as_posix())
----> 2 genomic_positions_data = zarr.Group(gcsmap, read_only=True)
      3 genomic_positions_data

/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer)
    110             meta_bytes = store[mkey]
    111         except KeyError:
--> 112             err_group_not_found(path)
    113         else:
    114             meta = decode_group_metadata(meta_bytes)

/opt/conda/lib/python3.6/site-packages/zarr/errors.py in err_group_not_found(path)
     27 
     28 def err_group_not_found(path):
---> 29     raise ValueError('group not found at path %r' % path)
     30 
     31 

ValueError: group not found at path None

@hardingnj

hardingnj commented 4 years ago

Thanks Lee. Just to add, the issue appears to be intermittent- so I doubt it's as simple as a change in the gcs API.

leehart commented 4 years ago

Thanks Lee. Just to add, the issue appears to be intermittent- so I doubt it's as simple as a change in the gcs API.

Yes, I can confirm that it is intermittent for me too. For example, I reran the same notebook, and it got past the previous point but failed with the same kind of error further on:

Example code:

# Define the path to the Zarr on the cloud
genomic_positions_accessibility_data_cloud_zarr_dir = os.path.join(
    'vo_agam_production/resources/observatory/non_n_accessibility', 
    'non_n_accessibility.zarr')
gcsmap = gcs.get_mapper(
    genomic_positions_accessibility_data_cloud_zarr_dir)
genomic_positions_accessibility_data = zarr.Group(gcsmap, read_only=True)
genomic_positions_accessibility_data
# Eyeball a summary of the `is_accessible` data for chrom_arm 3L
genomic_positions_accessibility_data["3L"]["is_accessible"]

Error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in __getitem__(self, key, default)
     75         try:
---> 76             result = self.fs.cat(key)
     77         except:

</opt/conda/lib/python3.6/site-packages/decorator.py:decorator-gen-138> in cat(self, path)

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in _tracemethod(f, self, *args, **kwargs)
     53 
---> 54     return f(self, *args, **kwargs)
     55 

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in cat(self, path)
    745         r = self.session.get(u2)
--> 746         r.raise_for_status()
    747         if 'X-Goog-Hash' in r.headers:

/opt/conda/lib/python3.6/site-packages/requests/models.py in raise_for_status(self)
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 

HTTPError: 503 Server Error: Service Unavailable for url: https://www.googleapis.com/download/storage/v1/b/vo_agam_production/o/resources%2Fobservatory%2Fnon_n_accessibility%2Fnon_n_accessibility.zarr%2F3L%2F.zgroup?alt=media

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer)
    109             mkey = self._key_prefix + group_meta_key
--> 110             meta_bytes = store[mkey]
    111         except KeyError:

/opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in __getitem__(self, key, default)
     79                 return default
---> 80             raise KeyError(key)
     81         return result

KeyError: 'vo_agam_production/resources/observatory/non_n_accessibility/non_n_accessibility.zarr/3L/.zgroup'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-23-cd400d75b2c9> in <module>
      1 # Eyeball a summary of the `is_accessible` data for chrom_arm 3L
----> 2 genomic_positions_accessibility_data["3L"]["is_accessible"]

/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __getitem__(self, item)
    328             return Group(self._store, read_only=self._read_only, path=path,
    329                          chunk_store=self._chunk_store, cache_attrs=self.attrs.cache,
--> 330                          synchronizer=self._synchronizer)
    331         else:
    332             raise KeyError(item)

/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer)
    110             meta_bytes = store[mkey]
    111         except KeyError:
--> 112             err_group_not_found(path)
    113         else:
    114             meta = decode_group_metadata(meta_bytes)

/opt/conda/lib/python3.6/site-packages/zarr/errors.py in err_group_not_found(path)
     27 
     28 def err_group_not_found(path):
---> 29     raise ValueError('group not found at path %r' % path)
     30 
     31 

ValueError: group not found at path '3L'
alimanfoo commented 4 years ago

If this is still occurring I'd suggest raising an issue on the gcsfs repo, just to say something like, after upgrading from gcsfs vX.X to gcsfs vY.Y (replacing X and Y with whatever the version numbers are) we are noticing some intermittent errors like "HTTPError: 503 Server Error: Service Unavailable for url: https://www.googleapis.com/download/storage/v1/b/...". Has there been any change to the logic around handling this type of error?

On Wed, 13 Nov 2019 at 14:59, Lee notifications@github.com wrote:

Thanks Lee. Just to add, the issue appears to be intermittent- so I doubt it's as simple as a change in the gcs API.

Yes, I can confirm that it is intermittent for me too. For example, I reran the same notebook, and it got past the previous point but failed with the same kind of error further on:

Example code:

Define the path to the Zarr on the cloud

genomic_positions_accessibility_data_cloud_zarr_dir = os.path.join( 'vo_agam_production/resources/observatory/non_n_accessibility', 'non_n_accessibility.zarr')

gcsmap = gcs.get_mapper( genomic_positions_accessibility_data_cloud_zarr_dir)

genomic_positions_accessibility_data = zarr.Group(gcsmap, read_only=True) genomic_positions_accessibility_data

Eyeball a summary of the is_accessible data for chrom_arm 3L

genomic_positions_accessibility_data["3L"]["is_accessible"]

Error:


HTTPError Traceback (most recent call last) /opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in getitem(self, key, default) 75 try: ---> 76 result = self.fs.cat(key) 77 except:

</opt/conda/lib/python3.6/site-packages/decorator.py:decorator-gen-138> in cat(self, path)

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in _tracemethod(f, self, *args, *kwargs) 53 ---> 54 return f(self, args, **kwargs) 55

/opt/conda/lib/python3.6/site-packages/gcsfs/core.py in cat(self, path) 745 r = self.session.get(u2) --> 746 r.raise_for_status() 747 if 'X-Goog-Hash' in r.headers:

/opt/conda/lib/python3.6/site-packages/requests/models.py in raise_for_status(self) 939 if http_error_msg: --> 940 raise HTTPError(http_error_msg, response=self) 941

HTTPError: 503 Server Error: Service Unavailable for url: https://www.googleapis.com/download/storage/v1/b/vo_agam_production/o/resources%2Fobservatory%2Fnon_n_accessibility%2Fnon_n_accessibility.zarr%2F3L%2F.zgroup?alt=media

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) /opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in init(self, store, path, read_only, chunk_store, cache_attrs, synchronizer) 109 mkey = self._key_prefix + group_meta_key --> 110 meta_bytes = store[mkey] 111 except KeyError:

/opt/conda/lib/python3.6/site-packages/fsspec/mapping.py in getitem(self, key, default) 79 return default ---> 80 raise KeyError(key) 81 return result

KeyError: 'vo_agam_production/resources/observatory/non_n_accessibility/non_n_accessibility.zarr/3L/.zgroup'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)

in 1 # Eyeball a summary of the `is_accessible` data for chrom_arm 3L ----> 2 genomic_positions_accessibility_data["3L"]["is_accessible"] /opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __getitem__(self, item) 328 return Group(self._store, read_only=self._read_only, path=path, 329 chunk_store=self._chunk_store, cache_attrs=self.attrs.cache, --> 330 synchronizer=self._synchronizer) 331 else: 332 raise KeyError(item) /opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer) 110 meta_bytes = store[mkey] 111 except KeyError: --> 112 err_group_not_found(path) 113 else: 114 meta = decode_group_metadata(meta_bytes) /opt/conda/lib/python3.6/site-packages/zarr/errors.py in err_group_not_found(path) 27 28 def err_group_not_found(path): ---> 29 raise ValueError('group not found at path %r' % path) 30 31 ValueError: group not found at path '3L' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .

--

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health Big Data Institute Li Ka Shing Centre for Health Information and Discovery University of Oxford Old Road Campus Headington Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 or +44 (0)7866 541624 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: @alimanfoo https://twitter.com/alimanfoo

Please feel free to resend your email and/or contact me by other means if you need an urgent reply.

alimanfoo commented 4 years ago

Just to add, might be worth trying out fsspec 0.6.0 on datalab to see if this error persists, before raising an issue upstream.

leehart commented 4 years ago

Early indications are (from tests relating to malariagen/vector-ops/pull/1102) that the fsspec==0.6.0 patch remedies this HTTPError.

There's a chance that we then hit intermittent KilledWorker and Workers don't have promised key errors, but these might be separate issues.

I'll finish testing with fsspec 0.6.0 and report back.

leehart commented 4 years ago

We think this will be fixed by malariagen/binder#66

I suppose it might be possible that the disappearance of this error coincided with some other fix outside our code. I haven't tried reverting fsspec and seeing if we still get this error yet.

leehart commented 4 years ago

Closed because we no longer get this error on Datalab.