Closed heyanand closed 1 year ago
Hey Hemanand! I think what's going on here is that we're trying to user rasterio (and thus GDAL) to parse the CRS information for the data. Searching around, it seems like GDAL may not support the type of grid that these files are using.
Can you run grib_ls
and paste it here?
I think the best path forward to fix this issue for weather-mv bq
would be to catch the rasterio error and set default projection information and then return the dataset. Projection info is not relevant for this data sink anyway.
Hey Alex, Attached the file here.
weather-mv is unable to read ecmwf grib files with the following set of Raster errors being thrown out. Here is the command trace:
`weather-tools) jupyter@gi-asset:~/weather-tools$ python weather_mv/weather-mv bigquery --uris "gs://gi_asset-ecmwf-ensemble-data/hres-sample/single-levelshres.gb" --output_table "$PROJECT.ecmwf.hres-sample-check" --temp_location "gs://gi_asset-ecmwf-ensemble-data/tmp/" --runner DataflowRunner --num_workers 2 --project $PROJECT --region us-central1 --job_name hrse-sample-mv-003 --disk_size_gb 200 no previously-included directories found matching 'test_data' INFO:loader_pipeline.bq:Validating regions for data migration. This might take a few seconds... INFO:loader_pipeline.bq:Region validation completed successfully. INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds. INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds. INFO:apache_beam.io.gcp.gcsio:Starting the file information of the input INFO:apache_beam.io.gcp.gcsio:Finished listing 1 files in 0.05787062644958496 seconds. WARNING:loader_pipeline.sinks:Assuming grib. INFO:loader_pipeline.sinks:Normalizing the grib schema, name of the data variables will look like '<attrs['GRIBstepType']>'.
ERROR:loader_pipeline.sinks:Unable to open file 'gs://gi_asset-ecmwf-ensemble-data/hres-sample/single-levels_hres_2020-01-01T00_00_00z-u100-v100-u10-v10-u200-v200-2t-2d-ssr-str-sp-msl-tprate-ptype-blh-sr-tp.gb': only size-1 arrays can be converted to Python scalars
Traceback (most recent call last):
File "weather_mv/weather-mv", line 74, in
cli(['--extra_package', pkg_archive])
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/init.py", line 23, in cli
pipeline( run(sys.argv + extra))
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/pipeline.py", line 71, in pipeline
paths | "MoveToBigQuery" >> ToBigQuery.from_kwargs(vars(known_args))
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 55, in from_kwargs
return cls({k: v for k, v, in kwargs.items() if k in fields})
File "", line 17, in init
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/bq.py", line 154, in post_init__
with open_dataset(self.first_uri, self.xarray_open_dataset_kwargs,
File "/opt/conda/envs/weather-tools/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 385, in open_dataset
xr_dataset: xr.Dataset = open_dataset_file(local_path,
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 311, in open_dataset_file
return _add_is_normalized_attr(normalize_grib_dataset(filename), True)
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 229, in __normalize_grib_dataset
forecast_hour = int(da.step.values / np.timedelta64(1, 'h'))
TypeError: only size-1 arrays can be converted to Python scalars
(weather-tools) jupyter@gi-python:~/weather-tools$ python weather_mv/weather-mv bigquery --uris "gs://gi_asset-ecmwf-ensemble-data/hres-sample/2020-01-01T00:00:00z-u100-v100-u10-v10-u200-v200-2t-2d-ssr-str-sp-msl-tprate-ptype-blh-sr-tp.gb" --output_table "megatron-389205.ecmwf.hres-sample-check" --temp_location "gs://gi_asset-ecmwf-ensemble-data/tmp/loadprocess/" --runner DataflowRunner --num_workers 2 --project megatron-389205 --region us-central1 --job_name hrse-sample-mv-003 --disk_size_gb 200 --disable_grib_schema_normalization no previously-included directories found matching 'test_data' INFO:loader_pipeline.bq:Validating regions for data migration. This might take a few seconds... INFO:loader_pipeline.bq:Region validation completed successfully. INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds. INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds. WARNING:loader_pipeline.sinks:Assuming grib edition 1. INFO:rasterio._env:GDAL signalled an error: err_no=4, msg='/var/tmp/tmphj2fbdcj is a grib file, but no raster dataset was successfully identified.' ERROR:loader_pipeline.sinks:Unable to open file 'gs://gi_asset-ecmwf-ensemble-data/hres-sample/2020-01-01T00:00:00z-u100-v100-u10-v10-u200-v200-2t-2d-ssr-str-sp-msl-tprate-ptype-blh-sr-tp.gb': /var/tmp/tmphj2fbdcj is a grib file, but no raster dataset was successfully identified. Traceback (most recent call last): File "rasterio/_base.pyx", line 302, in rasterio._base.DatasetBase.init File "rasterio/_base.pyx", line 213, in rasterio._base.open_dataset File "rasterio/_err.pyx", line 217, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: /var/tmp/tmphj2fbdcj is a grib file, but no raster dataset was successfully identified.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "weather_mv/weather-mv", line 74, in
cli(['--extra_package', pkg_archive])
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/init.py", line 23, in cli
pipeline(run(sys.argv + extra))
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/pipeline.py", line 71, in pipeline
paths | "MoveToBigQuery" >> ToBigQuery.from_kwargs(vars(known_args))
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 55, in from_kwargs
return cls({k: v for k, v, in kwargs.items() if k in fields})
File "", line 17, in init
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/bq.py", line 154, in post_init__
with open_dataset(self.first_uri, self.xarray_open_dataset_kwargs,
File "/opt/conda/envs/weather-tools/lib/python3.8/contextlib.py", line 113, in enter__
return next(self.gen)
File "/home/jupyter/weather-tools/weather_mv/loader_pipeline/sinks.py", line 399, in open_dataset
with rasterio.open(local_path, 'r') as f:
File "/opt/conda/envs/weather-tools/lib/python3.8/site-packages/rasterio/env.py", line 442, in wrapper
return f( args, kwds)
File "/opt/conda/envs/weather-tools/lib/python3.8/site-packages/rasterio/init.py", line 277, in open
dataset = DatasetReader(path, driver=driver, sharing=sharing, kwargs)
File "rasterio/_base.pyx", line 304, in rasterio._base.DatasetBase.init
rasterio.errors.RasterioIOError: /var/tmp/tmphj2fbdcj is a grib file, but no raster dataset was successfully identified.
`
The sample file mentioned can be read without any issues locally using cfgrib open datasets. Please help in troubleshooting/fixing the above issue with weather-mv