Closed pnuu closed 2 years ago
Merging #138 (d4eb0ca) into main (73c9e35) will decrease coverage by
0.07%
. The diff coverage is100.00%
.:exclamation: Current head d4eb0ca differs from pull request most recent head dbd4e5d. Consider uploading reports for the commit dbd4e5d to get more accurate results
@@ Coverage Diff @@
## main #138 +/- ##
==========================================
- Coverage 95.56% 95.48% -0.08%
==========================================
Files 11 11
Lines 2365 2372 +7
==========================================
+ Hits 2260 2265 +5
- Misses 105 107 +2
Flag | Coverage Δ | |
---|---|---|
unittests | 95.48% <100.00%> (-0.08%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
trollflow2/plugins/__init__.py | 92.84% <100.00%> (-0.38%) |
:arrow_down: |
trollflow2/tests/test_trollflow2.py | 99.47% <100.00%> (+<0.01%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 97882c0...dbd4e5d. Read the comment docs.
Here's a short scipt using Satpy directly that demonstrates the random crashes:
#!/usr/bin/env python
import glob
import numpy as np
from satpy import Scene
from satpy.writers import compute_writer_results
COMPUTE = False
def main():
fnames = glob.glob(
"/home/lahtinep/data/satellite/polar/ears_pps/*48965.nc")
dsets = ['cma', 'ct', 'ctth_alti', 'ctth_pres', 'ctth_tempe']
glbl = Scene(reader='nwcsaf-pps_nc', filenames=fnames)
glbl.load(dsets)
dtype_int16 = {'dtype': np.int16}
encoding = {'cma': dtype_int16, 'ct': dtype_int16}
res = glbl.save_datasets(writer='cf', filename="/tmp/pps_test.nc",
encoding=encoding, include_lonlats=True, compute=COMPUTE)
if not COMPUTE:
compute_writer_results([res])
if __name__ == "__main__":
main()
I'm trying to find an example that wouldn't require actual data.
And the corresponding version as trollflow2.yaml
:
product_list:
output_dir:
/tmp/
fname_pattern:
"{start_time:%Y%m%d_%H%M}_{platform_name}_{areaname}_EARS_PPS.nc"
reader: nwcsaf-pps_nc
subscribe_topics:
- /test/ears/avhrr/pps/gatherer
eager_writing: True
areas:
null:
priority: 1
areaname: swath
products:
("cma", "ct", "ctth_alti", "ctth_pres", "ctth_tempe"):
formats:
- format: nc
writer: cf
encoding:
cma:
dtype: !!python/name:numpy.int16
ct:
dtype: !!python/name:numpy.int16
include_lonlats: True
workers:
- fun: !!python/name:trollflow2.plugins.create_scene
- fun: !!python/name:trollflow2.plugins.load_composites
- fun: !!python/name:trollflow2.plugins.resample
- fun: !!python/name:trollflow2.plugins.save_datasets
Another pure-Satpy example using SEVIRI HRIT data:
#!/usr/bin/env python
import glob
from satpy import Scene
from satpy.writers import compute_writer_results
COMPUTE = False
def main():
fnames = glob.glob(
"/home/lahtinep/data/satellite/geo/0deg/*202202031045*")
dsets = ['VIS006', 'VIS008']
glbl = Scene(reader='seviri_l1b_hrit', filenames=fnames)
glbl.load(dsets)
res = glbl.save_datasets(writer='cf', filename="/tmp/seviri_test.nc",
include_lonlats=True, compute=COMPUTE)
if not COMPUTE:
compute_writer_results([res])
if __name__ == "__main__":
main()
UPDATED VERSION
And a version that fails without any Pytroll code:
#!/usr/bin/env python
import datetime as dt
import numpy as np
import dask.array as da
import xarray as xr
COMPUTE = False
FNAME = "/tmp/xr_test.nc"
def main():
y = np.arange(1000, dtype=np.uint16)
x = np.arange(2000, dtype=np.uint16)
now = dt.datetime.utcnow()
times = xr.DataArray(np.array([now + dt.timedelta(seconds=i) for i in range(y.size)], dtype=np.datetime64),
coords={'y': y})
# Write root
root = xr.Dataset({}, attrs={'global': 'attribute'})
written = [root.to_netcdf(FNAME, mode='w')]
# Write first dataset
data1 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset1 = xr.Dataset({'data1': data1})
written.append(dset1.to_netcdf(FNAME, mode='a', compute=COMPUTE))
# Write second dataset using the same time coordinates
data2 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset2 = xr.Dataset({'data2': data2})
written.append(dset2.to_netcdf(FNAME, mode='a', compute=COMPUTE))
if not COMPUTE:
da.compute(written)
if __name__ == "__main__":
main()
Created an issue to XArray: https://github.com/pydata/xarray/issues/6300
I added a note to the example configuration file. This will most likely be a temporary solution until https://github.com/pydata/xarray/issues/6300 is fixed.
When saving multiple datasets to a single CF/NetCDF4 file using the syntax introduced in https://github.com/pytroll/trollflow2/pull/51 , I got random crashes resulting in
RuntimeError: NetCDF: Not a valid ID
within the XArray library. Some internet searching suggested that this is due to trying to use dimensions that have not yet been defined.My solution to this is adding a config option that forces an eager saving instead of delaying the saving and calling
compute_writer_results()
afterwards. With this PR, I haven't seen this happen a single time over 50 consecutive runs.flake8 trollflow2