pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

extract scene metadata from satellite image #1675

Open willows-1 opened 3 years ago

willows-1 commented 3 years ago

Hi, I would like to know how I can extract scene metadata (such as start and end time, longitude, latitude, etc) from satellite image? Appreciate all the help I can get!

sfinkens commented 3 years ago

@willows-1 Sorry, I just noticed that my proposed solution (expand_dims) doesn't work either. The actual problem is that the start_time attribute is slightly different for each band. But you cannot assign multiple values to the same time coordinate in the netCDF file. As a workaround you could average the timestamps or just pick the first one like so:

mytime = scn['B01'].attrs['start_time']
for band in mybands:
    scn[band] = scn[band].expand_dims(time=[mytime])
scn.save_datasets(...)

Of course you can also use your alternative solution.

willows-1 commented 3 years ago

thanks @sfinkens for your advice! will try out if your suggested code works. For my alternative solution, does the code make sense? And will the data be stored and remain the same if I concatanate?

sfinkens commented 3 years ago

I think so, but you should check that yourself ;)

willows-1 commented 3 years ago

I think so, but you should check that yourself ;)

yeap I tried it out, and this is the output:

image

The start time is there but no end time

sfinkens commented 3 years ago
mytime = scn['B01'].attrs['start_time']
for band in mybands:
    scn[band] = scn[band].expand_dims(time=[mytime])
scn.save_datasets(...)

Try that solution and check out the time_bnds variable in the netCDF file. It contains start and end timestamps. Maybe that is what you are looking for.

willows-1 commented 3 years ago

I am confused with how to proceed with the code. Is it okay if I paste the code below and help to double check with the code? Below is the entire updated code which includes @sfinkens and @djhoese codes:

from satpy import Scene, MultiScene
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pyresample import geometry

%matplotlib notebook
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from glob import glob

from satpy.scene import Scene
from satpy import find_files_and_readers
from datetime import datetime

import xarray as xr
import numpy as np
import scipy as sp
import numpy as np

filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT')

scn = Scene(reader='ahi_hsd', filenames=filenames)

all_names = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16']
scn.load(all_names)

mytime = scn['B01'].attrs['start_time']
for band in all_names:
    scn[band] = scn[band].expand_dims(time=[mytime])

mytime2 = scn['B01'].attrs['end_time']
for band in all_names:
    scn[band] = scn[band].expand_dims(time2=[mytime2])

#crop image
cropped_scn = scn.crop(ll_bbox=(103., 1.,105., 3.))
new_scn = cropped_scn.resample(resampler='native')

new_scn.save_datasets(writer='cf', datasets= all_names, filename='All band data at 0200 version3.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")
willows-1 commented 3 years ago

But @sfinkens how come you included "cn.save_datasets(...) " in the for loop?

sfinkens commented 3 years ago

Expanding the dimensions twice will make the dataset 4-dimensional (time2, time, y, x). I guess this is not what you want. Try removing the block

mytime2 = scn['B01'].attrs['end_time']
for band in all_names:
    scn[band] = scn[band].expand_dims(time2=[mytime2])

and then check out your generated netCDF file:

In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('myfile.nc')
In [3]: ds['time_bnds']
Out[3]: 
<xarray.DataArray 'time_bnds' (time: 1, bnds_1d: 2)>
array([['2021-05-20T03:50:21.126857000', '2021-05-20T03:59:40.525149000']],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2021-05-20T03:50:21.126857
Dimensions without coordinates: bnds_1d

There you have one start and end time for the scene.

sfinkens commented 3 years ago

But @sfinkens how come you included "cn.save_datasets(...) " in the for loop?

It's not, only indented lines belong to the for loop

willows-1 commented 3 years ago

@sfinkens I am getting this error:

image

zxdawn commented 3 years ago

@sfinkens @willows-1 I'm trying to make it work well and here's the my new exported dataset: image

If this looks fine, I can make a PR.

willows-1 commented 3 years ago

@zxdawn this looks fine. Is it possible to send the code here?

willows-1 commented 3 years ago

I would like to have a look at your code to see how you managed to get the output

zxdawn commented 3 years ago

@willows-1 I will create a PR and let you know later. I suppose the time_bnds should be changed to <channel>_time_bnds, as they may have different times. What do you think @sfinkens ?

willows-1 commented 3 years ago

@willows-1 I will create a PR and let you know later. I suppose the time_bnds should be changed to <channel>_time_bnds, as they may have different times. What do you think @sfinkens ?

@zxdawn sure thank you. By the way what is PR?

zxdawn commented 3 years ago

@willows-1 PR is "pull request". Once the PR is merged, you can use the updated satpy to accomplish your task. Here's the PR link.

willows-1 commented 3 years ago

@zxdawn thank you very much on assisting with this issue. I will look into this and see if its working for me

willows-1 commented 3 years ago

@zxdawn the code that you mentioned in the PR link considers only one band channel. If I want to include all channels from B01- B16, how can I edit the code below (which you pasted in the PR link) to include all bands?

import xarray as xr
from glob import glob
from satpy import Scene, MultiScene
from satpy.multiscene import timeseries

abi_dir = '../data/GOES-16/ABI_L1/'
abi_name = 'OR_ABI-L1b-RadC-M6C13_G16_s'
channel = 'C13'
reader = 'abi_l1b'

filenames = glob(abi_dir+abi_name+'2020153000*') # two example files

# check the start_time and end_time of each file
scn_1 = Scene([filenames[0]], reader='abi_l1b')
scn_2 = Scene([filenames[1]], reader='abi_l1b')

# get the mscn
mscn = MultiScene.from_files(filenames, reader='abi_l1b')
mscn.load(['C13'])
blended_scene = mscn.blend(blend_function=timeseries)

# save the mscn to nc file
blended_scene.save_datasets(filename='test.nc')

ds = xr.open_dataset('./test.nc')

print(ds, '\n')

print('-'*5, 'C13_start_time')
print(ds.C13_start_time, '\n')

print('-'*5, 'C13_end_time')
print(ds.C13_end_time, '\n')

print('-'*5, 'time_bnds')
print(ds['time_bnds'])
zxdawn commented 3 years ago

@willows-1 It should be like this mentioned in the Guide:

mscn = MultiScene.from_files(glob('/data/abi/day_1/*C0[12]*.nc'), reader='abi_l1b')
mscn.load(['C01', 'C02'])
willows-1 commented 3 years ago

@zxdawn this is the code that I edited to include all the band channels:

import xarray as xr
from glob import glob
from satpy import Scene, MultiScene
from satpy.multiscene import timeseries

filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT') # two example files
len(filenames)

# check the start_time and end_time of each file
# check the start_time and end_time of each file
scn_1 = Scene([filenames[0]], reader='ahi_hsd')
scn_2 = Scene([filenames[1]], reader='ahi_hsd')

all_names = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16']

# get the mscn
mscn = MultiScene.from_files(filenames, reader='ahi_hsd')
mscn.load(all_names)
blended_scene = mscn.blend(blend_function=timeseries)

# save the mscn to nc file
blended_scene.save_datasets(writer='cf', datasets= all_names, filename='test.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")

But when I tried to save the dataset, I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-7b7b5aa9c8bf> in <module>
      1 # save the mscn to nc file
----> 2 blended_scene.save_datasets(writer='cf', datasets= all_names, filename='test.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\scene.py in save_datasets(self, writer, filename, datasets, compute, **kwargs)
   1036                                           filename=filename,
   1037                                           **kwargs)
-> 1038         return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
   1039 
   1040     @staticmethod

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in save_datasets(self, datasets, filename, groups, header_attrs, engine, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix, **to_netcdf_kwargs)
    756                 group_datasets, epoch=epoch, flatten_attrs=flatten_attrs, exclude_attrs=exclude_attrs,
    757                 include_lonlats=include_lonlats, pretty=pretty, compression=compression,
--> 758                 include_orig_name=include_orig_name, numeric_name_prefix=numeric_name_prefix)
    759             dataset = xr.Dataset(datas)
    760             if 'time' in dataset:

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in _collect_datasets(self, datasets, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix)
    654 
    655         # Check and prepare coordinates
--> 656         assert_xy_unique(datas)
    657         link_coords(datas)
    658         datas = make_alt_coords_unique(datas, pretty=pretty)

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in assert_xy_unique(datas)
    232             unique_x.add(token_x)
    233     if len(unique_x) > 1 or len(unique_y) > 1:
--> 234         raise ValueError('Datasets to be saved in one file (or one group) must have identical projection coordinates. '
    235                          'Please group them by area or save them in separate files.')
    236 

ValueError: Datasets to be saved in one file (or one group) must have identical projection coordinates. Please group them by area or save them in separate files.
zxdawn commented 3 years ago

@willows-1 It seems some files have different projection coordinates. @djhoese may know the problem.

willows-1 commented 3 years ago

alright

djhoese commented 3 years ago

You create two Scene objects and then don't use them. Is this on purpose? My guess on your error is that the MultiScene is accidentally grouping your two sets of files into one scene when it should be two. What do you get when you do:

print(len(mscn.scenes))

How many input files do you have? If you are loading multiple channels then they will have different resolutions and there for different projection coordinates. You would need to resample your MultiScene so that all bands are at the same resolution.

Edit: These are just guesses.

willows-1 commented 3 years ago

@djhoese I have a total of 160 input files, and 16 band channels, from B01-B16. I have edited the code according to your suggestions. This is the code:

import xarray as xr
from glob import glob
from satpy import Scene, MultiScene
from satpy.multiscene import timeseries

filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT') # two example files
len(filenames)

all_names = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16']

# get the mscn
mscn = MultiScene.from_files(filenames, reader='ahi_hsd')
mscn.load(all_names)

#crop image
cropped_scn = mscn.crop(ll_bbox=(103., 1.,105., 3.))
new_mscn = cropped_scn.resample(resampler='native')

blended_scene = mscn.blend(blend_function=timeseries)

# save the mscn to nc file
blended_scene.save_datasets(writer='cf', datasets= all_names, filename='test.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")

I have done the resampling but I am still getting an error when I save the multi scene:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-7b7b5aa9c8bf> in <module>
      1 # save the mscn to nc file
----> 2 blended_scene.save_datasets(writer='cf', datasets= all_names, filename='test.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\scene.py in save_datasets(self, writer, filename, datasets, compute, **kwargs)
   1036                                           filename=filename,
   1037                                           **kwargs)
-> 1038         return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
   1039 
   1040     @staticmethod

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in save_datasets(self, datasets, filename, groups, header_attrs, engine, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix, **to_netcdf_kwargs)
    756                 group_datasets, epoch=epoch, flatten_attrs=flatten_attrs, exclude_attrs=exclude_attrs,
    757                 include_lonlats=include_lonlats, pretty=pretty, compression=compression,
--> 758                 include_orig_name=include_orig_name, numeric_name_prefix=numeric_name_prefix)
    759             dataset = xr.Dataset(datas)
    760             if 'time' in dataset:

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in _collect_datasets(self, datasets, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix)
    654 
    655         # Check and prepare coordinates
--> 656         assert_xy_unique(datas)
    657         link_coords(datas)
    658         datas = make_alt_coords_unique(datas, pretty=pretty)

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in assert_xy_unique(datas)
    232             unique_x.add(token_x)
    233     if len(unique_x) > 1 or len(unique_y) > 1:
--> 234         raise ValueError('Datasets to be saved in one file (or one group) must have identical projection coordinates. '
    235                          'Please group them by area or save them in separate files.')
    236 

ValueError: Datasets to be saved in one file (or one group) must have identical projection coordinates. Please group them by area or save them in separate files.
djhoese commented 3 years ago

Can you do that print(len(mscn.scenes))?

Also, you have blended_scene = mscn.blend(blend_function=timeseries), but you want blended_scene = new_mscn.blend(blend_function=timeseries) which changes mscn.blend to new_mscn.blend.

willows-1 commented 3 years ago

image

This is the output of print(len(mscn.scenes))

willows-1 commented 3 years ago

Can you do that print(len(mscn.scenes))?

Also, you have blended_scene = mscn.blend(blend_function=timeseries), but you want blended_scene = new_mscn.blend(blend_function=timeseries) which changes mscn.blend to new_mscn.blend.

I am still gettting the same error when I changed to blended_scene = new_mscn.blend(blend_function=timeseries):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-7b7b5aa9c8bf> in <module>
      1 # save the mscn to nc file
----> 2 blended_scene.save_datasets(writer='cf', datasets= all_names, filename='test.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\scene.py in save_datasets(self, writer, filename, datasets, compute, **kwargs)
   1036                                           filename=filename,
   1037                                           **kwargs)
-> 1038         return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
   1039 
   1040     @staticmethod

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in save_datasets(self, datasets, filename, groups, header_attrs, engine, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix, **to_netcdf_kwargs)
    760             if 'time' in dataset:
    761                 dataset['time_bnds'] = make_time_bounds(start_times,
--> 762                                                         end_times)
    763                 dataset['time'].attrs['bounds'] = "time_bnds"
    764                 dataset['time'].attrs['standard_name'] = "time"

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\dataset.py in __setitem__(self, key, value)
   1523 
   1524         else:
-> 1525             self.update({key: value})
   1526 
   1527     def __delitem__(self, key: Hashable) -> None:

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\dataset.py in update(self, other)
   4095         Dataset.assign
   4096         """
-> 4097         merge_result = dataset_update_method(self, other)
   4098         return self._replace(inplace=True, **merge_result._asdict())
   4099 

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\merge.py in dataset_update_method(dataset, other)
    971         priority_arg=1,
    972         indexes=indexes,  # type: ignore
--> 973         combine_attrs="override",
    974     )

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
    619     coerced = coerce_pandas_values(objects)
    620     aligned = deep_align(
--> 621         coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    622     )
    623     collected = collect_variables_and_indexes(aligned)

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
    431         indexes=indexes,
    432         exclude=exclude,
--> 433         fill_value=fill_value,
    434     )
    435 

C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
    338             if len(unlabeled_sizes | {labeled_size}) > 1:
    339                 raise ValueError(
--> 340                     f"arguments without labels along dimension {dim!r} cannot be "
    341                     f"aligned because they have different dimension size(s) {unlabeled_sizes!r} "
    342                     f"than the size of the aligned dimension labels: {labeled_size!r}"

ValueError: arguments without labels along dimension 'time' cannot be aligned because they have different dimension size(s) {1} than the size of the aligned dimension labels: 16
djhoese commented 3 years ago

How many time steps do these 160 files represent? A single time? I haven't really been following this discussion, sorry.

If they represent more than one time step then from_files is not working properly. If you are only using one time step, then why is MultiScene being used?

zxdawn commented 3 years ago

@djhoese the time error may be related to #1686. I suppose @willows-1 is using the official satpy and the time series length is 16. If @willows-1 uses the modified source codes in #1686, it may work (although that PR isn't all finished).

willows-1 commented 3 years ago

actually @sfinkens suggested to set a single time scene through the code:

mytime = scn['B01'].attrs['start_time']
for band in mybands:
    scn[band] = scn[band].expand_dims(time=[mytime])
scn.save_datasets(...

But @zxdawn used multi scene method. So thats why I try to use multi scene. Actually even I am getting confused too. Initially I used @sfinkens method and @djhoese suggestion and I was able to get the output:

image

So since my aim is to get the start_time and end_time and the longitude and latitude data, the output above achieved the aim right? Or is there a better way?

willows-1 commented 3 years ago

@djhoese the time error may be related to #1686. I suppose @willows-1 is using the official satpy and the time series length is 16. If @willows-1 uses the modified source codes in #1686, it may work (although that PR isn't all finished).

@zxdawn so your code might work once the PR is finished?

sfinkens commented 3 years ago

actually @sfinkens suggested to set a single time scene through the code: ... But @zxdawn used multi scene method. So thats why I try to use multi scene.

As I said, the MultiScene + timeseries approach doesn't work at the moment. I'll look into #1686 soon.

willows-1 commented 3 years ago

noted, thank you