Open willows-1 opened 3 years ago
Hi,
This is how to get the metadata you mentioned. Depending on the data you have, modify the channel name(s) to load. Inspect the dictionaries scn.attrs
and scn["some_channel"].attrs
to see what metadata are available.
from satpy import Scene
import glob
fnames = glob.glob("/path/to/data/for/one/time/slot/*")
scn = Scene(filenames=fnames)
scn.load(["some_channel"])
# Get coordinates for the data
lons, lats = scn["some_channel"].attrs['area'].get_lonlats()
# Get the scene start/end times
start_time = scn.attrs["start_time"]
end_time = scn.attrs["end_time"]
Hi pnuu, thank you very much for your reply! If I want to crop out the image and then find the longitude and latitude, is it possible? I did a short code on cropping of the image. Below is the entire code, which also includes the one you suggested:
%matplotlib notebook
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from glob import glob
from satpy.scene import Scene
from satpy import find_files_and_readers
from datetime import datetime
from glob import glob
from rasterio.plot import plotting_extent
import cartopy
import cartopy.crs as ccrs
from cartopy._crs import (CRS, Geodetic, Globe, PROJ4_VERSION,
WGS84_SEMIMAJOR_AXIS, WGS84_SEMIMINOR_AXIS)
from satpy.writers import get_enhanced_image
from satpy import Scene
filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT')
len(filenames)
scn = Scene(reader='ahi_hsd', filenames=filenames)
my_channel = 'B01'
cropped_scn = scn.crop(ll_bbox=(103., 1.,105., 3.))
# resample from satellite native geolocation
new_scn = cropped_scn.resample(resampler='native')
new_scn.load([my_channel])
# Get coordinates for the data
lons, lats = new_scn[my_channel].attrs['area'].get_lonlats()
# Get the scene start/end times
start_time = scn.attrs['start_time']
end_time = scn.attrs['end_time']
and is it possible to save these data as xarray or netcdf?
@willows-1 It looks like you have already figured out how to get the lon/lat arrays (with your lons, lats = new_scn[my_channel].attrs['area'].get_lonlats()
line). There are other methods on that AreaDefinition
object for get other similar information like row/col indexes, convert x/y coordinates to lon/lat coordinates, etc.
What do you mean by saving "as xarray"? For netcdf, you can see the 'cf'
writer in Satpy. Please see the Satpy documentation for more information.
Hi djhoese, thanks for your reply! So just to confirm, basically the metadata of satellite image is the screenshot as shown below?
and basically the metadata of longitude and latitude coordinates are usually in array form?
as for the saving the metadata as netcdf, I can follow this code below?
from satpy import Scene
import glob
filenames = glob.glob('data/H*201903011200*')
scn = Scene(filenames=filenames, reader='seviri_l1b_hrit')
scn.load(['VIS006', 'IR_108'])
scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc',
exclude_attrs=['raw_metadata'])
Regarding metadata, yes although for start/end time I would suggestion doing new_scn[my_channel].attrs['start_time']
. The .attrs
dictionary contains all of your metadata for a particular dataset. This should be described in the Satpy documentation.
Regarding lon/lat arrays, yes they are almost always arrays as Satpy will typically be dealing with 2D image arrays and the lon/lat arrays provide the coordinates for each pixel. If you want the coordinates for only a specific pixel then I suggest looking at the other methods of the AreaDefinition object (.attrs['area'])
). I'm sorry, I don't remember the names off the top of my head right now and it depends what you want to do.
Regarding the CF writer, yes that looks correct.
thank you very much for the feedback. So the code will be as below?
# Get coordinates for the data
lons, lats = new_scn[my_channel].attrs['area'].get_lonlats()
# Get the scene start/end times
start_time = new_scn[my_channel].attrs['start_time']
end_time = new_scn[my_channel].attrs['end_time']```
Yes.
thank you very much for your guidance!
as for the saving the metadata as netcdf, I can follow this code below?
from satpy import Scene import glob filenames = glob.glob('data/H*201903011200*') scn = Scene(filenames=filenames, reader='seviri_l1b_hrit') scn.load(['VIS006', 'IR_108']) scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc', exclude_attrs=['raw_metadata'])
Hi djhoese, just a quick follow up, I tried to save the meta data using the code above but I am getting this message: "No time dimension in datasets, skipping time bounds creation". How can I edit the code to include start and end time? As in how to include time dimension in datasets when saving scene metadata?
That message is a warning, not necessarily an error. Have you checked the actual output? It should have start and end time information in it.
nope, there is no start and end time information in it. Below is the screenshot of the output when I opened the .nc file:
What about in the B01
variable? It should have its own attributes and I would assume start_time and end_time would be there.
Is this how I call the 'B01' variable? I have attached the full code below:
Yes. And just like in Satpy you can do file['B01'].attrs['start_time']
to get that start time information.
Understood, but is possible to achieve the output like the screenshot below? The picture below is the output generated when I opened another .nc file. But for this one, I didnt have to call out a variable. I just had to run the code and it displayed longitude, latitude as well as the start and end time:
What are you actually trying to accomplish? Do you need to recreate this exact same NetCDF structure? I don't see a start time and end time in that output? I see "utc_hrs" which is similar but not the same. The NetCDF file you are writing with Satpy from a single Scene only has a single time step so it doesn't make much sense to have a full time variable (coordinate) as it would only have a single value in it.
Hi djhoese, my main aim is to obtain the longitude, latitude, the start_time and end_time for each band from B01-B16. So is it better to save the metadata of each band seperately in each file, like what I did previously by using the code below? And then open the .nc data and call the variable (eg. 'B01')?
from satpy import Scene
import glob
filenames = glob.glob('data/H*201903011200*')
scn = Scene(filenames=filenames, reader='seviri_l1b_hrit')
scn.load(['VIS006', 'IR_108'])
scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc',
exclude_attrs=['raw_metadata'])
Yes, I think so. You don't have to necessarily save each band separately, but you do need to access it individually. If there is a specific way you expect or want the NetCDF file to look like then we would accept a pull request to make the necessary changes (and preserve backwards compatibility). Right now I'm not sure I fully understand what is missing from the currently produced file that isn't what you want (except for the time dimension which as I said doesn't really make sense in this case).
hi djhoese, sure! There was also another way that I found and that was using xr.DataArray. Below is the list of code. However, this time, the longitude and latitude data is missing. The start_time and end_time is also not present. Is there a way to modify the below code to include the longitude and latitude data and the time?
%matplotlib notebook
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from glob import glob
from satpy.scene import Scene
from satpy import find_files_and_readers
from datetime import datetime
import xarray as xr
import numpy as np
import scipy as sp
import numpy as np
filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT')
scn = Scene(reader='ahi_hsd', filenames=filenames)
# load datasets from input files
scn.load(['B01'])
#crop image
cropped_scn = scn.crop(ll_bbox=(103., 1.,105., 3.))
# resample from satellite native geolocation
new_scn = cropped_scn.resample(resampler='native')
new_scn.load(['B01'])
new_scn['B01'] = xr.DataArray(new_scn['B01'],
dims = ['x', 'y'],
coords = {'time': np.datetime64('2021-04 17T00:02:00')},
attrs = dict(start_time = new_scn.start_time,
end_time = new_scn.end_time))
new_scn['B01']
new_scn.save_datasets(writer='cf', datasets= ['B01'], filename='B01-0200.nc',
exclude_attrs=['raw_metadata'])
#opening .nc file
import xarray as xr
import numpy as np
import scipy as sp
file = xr.open_dataset('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/B01-0200.nc', engine = 'netcdf4')
And this was the output generated when I opened the file:
So in this case the longitude and latitude data is missing
but I think using xr.DataArray is more challenging?
Try replacing this:
new_scn['B01'] = xr.DataArray(new_scn['B01'],
dims = ['x', 'y'],
coords = {'time': np.datetime64('2021-04 17T00:02:00')},
attrs = dict(start_time = new_scn.start_time,
end_time = new_scn.end_time))
with:
new_scn['B01'].coords['time'] = np.datetime64('2021-04 17T00:02:00')
If you need both start and end time (I assume B01 only has start_time) then you can do:
new_scn['B01'].attrs['end_time'] = new_scn['B01'].attrs['start_time']
Or whatever you would like to do. I also realize that your time
coordinate is hardcoded. To use start_time I think you could do:
new_scn['B01'].coords['time'] = np.datetime64(new_scn['B01'].attrs['start_time'])
Hmm actually B01 has both start and end time. Also, since now xr. DataArray is removed, how should I then proceed with the saving of the meta data?
The same as you have it. This code you had:
new_scn['B01'] = xr.DataArray(new_scn['B01'],
dims = ['x', 'y'],
coords = {'time': np.datetime64('2021-04 17T00:02:00')},
attrs = dict(start_time = new_scn.start_time,
end_time = new_scn.end_time))
new_scn['B01']
The first chunk of code is overwriting B01 in the new_scn. The second chunk isn't doing anything (outside of printing the DataArray in a notebook). The code I told you above is adding metadata to the DataArray. new_scn['B01']
is still a DataArray and should still work in the same way with save_datasets
.
Alright sure. So if I want to use end_time, the code will be the one below? But then it will be the same code as the start time like the one you suggested
new_scn['B01'].coords['time'] = np.datetime64(new_scn['B01'].attrs['end_time'])
And what about the longitude and latitude coordinates? How can I write the code?
So if I want to use end_time, the code will be the one below? But then it will be the same code as the start time like the one you suggested
As I said, this was only needed if the end_time didn't exist. You said it already exists for the B01 variable in the file you were originally producing so you don't need it.
Your original file also had the longitude and latitude arrays so you don't need to do anything extra. They are already there. By doing the .coords['time'] =
stuff we are adding to the information, not taking anything away. Everything that was there should still be there as it was in your original file.
alright, thank you very much!
Sorry djhoese, I have one more question. If I want to combine all bands into one data, how to adjust the code for "new_scn['B01'].coords['time'] = np.datetime64(new_scn['B01'].attrs['end_time'])" in order to add all the time data for all data? I am getting the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-503d65a00bb9> in <module>
----> 1 new_scn.scn.all_dataset_names().coords['time'] = np.datetime64(new_scn.scn.all_dataset_names().attrs['end_time'])
2
3 new_scn.scn.all_dataset_names().coords['time'] = np.datetime64(new_scn.scn.all_dataset_names().attrs['start_time'])
AttributeError: 'Scene' object has no attribute 'scn'
Below is the updated code that I have done so far to include all bands at once:
from satpy import Scene, MultiScene
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pyresample import geometry
%matplotlib notebook
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from glob import glob
from satpy.scene import Scene
from satpy import find_files_and_readers
from datetime import datetime
import xarray as xr
import numpy as np
import scipy as sp
import numpy as np
filenames = glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/16_bands/*20210417_0200*.DAT')
scn = Scene(reader='ahi_hsd', filenames=filenames)
scn.load(scn.all_dataset_names())
#crop image
cropped_scn = scn.crop(ll_bbox=(103., 1.,105., 3.))
# resample from satellite native geolocation
new_scn = cropped_scn.resample(resampler='native')
new_scn.load(scn.all_dataset_names())
new_scn.scn.all_dataset_names().coords['time'] = np.datetime64(new_scn.scn.all_dataset_names()..attrs['end_time'])
new_scn.scn.all_dataset_names().coords['time'] = np.datetime64(new_scn.scn.all_dataset_names().attrs['start_time'])
new_scn.save_datasets(writer='cf', datasets= scn.all_dataset_names(), filename='All band data at 0200 version2.nc',
exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")
I think this is a small misunderstanding about how dictionaries in Python work. First, you can't do new_scn[multiple_names]
and update all the names at once. You have to do one at a time. Also, you have mistake in your code where you do new_scn.scn.all_dataset_names().coords['time']
when what you meant was new_scn[scn.all_dataset_names()].coords['time']
, but as I said this won't work. The traceback/error was trying to tell you this.
I personally would suggest being explicit about what bands you want.
all_names = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16']
scn.load(all_names)
new_scn = cropped_scn.resample(resampler='native')
for ds_name in all_names:
new_scn[ds_name].coords['time'] = np.datetime64(new_scn[ds_name].attrs['start_time'])
Other gotchas:
new_scn.load(scn.all_dataset_names())
in your code. You can't load from a resampled scene..coords['time']
twice. Once for end_time and once for start_time. The way the code is written and being used, you can only have one value for the time coordinate so assigning twice, first with end_time then start_time, is overwriting the end_time assignment.alright, understood now, so how will I proceed with saving of the metadata? Is it the code below?
new_scn.save_datasets(writer='cf', datasets= all_names, filename='All band data at 0200 version2.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")
Trying running it and you let me know if you get an error.
yeap I got an error saying:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-e1edaf71397e> in <module>
----> 1 new_scn.save_datasets(writer='cf', datasets= all_names, filename='All band data at 0200 version2.nc', exclude_attrs=['raw_metadata'], base_dir = "C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm")
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\scene.py in save_datasets(self, writer, filename, datasets, compute, **kwargs)
1045 filename=filename,
1046 **kwargs)
-> 1047 return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
1048
1049 @staticmethod
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\satpy\writers\cf_writer.py in save_datasets(self, datasets, filename, groups, header_attrs, engine, epoch, flatten_attrs, exclude_attrs, include_lonlats, pretty, compression, include_orig_name, numeric_name_prefix, **to_netcdf_kwargs)
761 if 'time' in dataset:
762 dataset['time_bnds'] = make_time_bounds(start_times,
--> 763 end_times)
764 dataset['time'].attrs['bounds'] = "time_bnds"
765 dataset['time'].attrs['standard_name'] = "time"
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\dataset.py in __setitem__(self, key, value)
1522
1523 else:
-> 1524 self.update({key: value})
1525
1526 def __delitem__(self, key: Hashable) -> None:
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\dataset.py in update(self, other)
4036 Dataset.assign
4037 """
-> 4038 merge_result = dataset_update_method(self, other)
4039 return self._replace(inplace=True, **merge_result._asdict())
4040
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\merge.py in dataset_update_method(dataset, other)
976 priority_arg=1,
977 indexes=indexes,
--> 978 combine_attrs="override",
979 )
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
619 coerced = coerce_pandas_values(objects)
620 aligned = deep_align(
--> 621 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
622 )
623 collected = collect_variables_and_indexes(aligned)
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
428 indexes=indexes,
429 exclude=exclude,
--> 430 fill_value=fill_value,
431 )
432
C:\ProgramData\anaconda3\envs\satpy\lib\site-packages\xarray\core\alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
334 "aligned because they have different dimension size(s) %r "
335 "than the size of the aligned dimension labels: %r"
--> 336 % (dim, unlabeled_sizes, labeled_size)
337 )
338
ValueError: arguments without labels along dimension 'time' cannot be aligned because they have different dimension size(s) {1} than the size of the aligned dimension labels: 16
@sfinkens Any ideas what's going wrong here?
I tried to execute a different method by saving the metadata of each band seperately in each file, then I used the following code to combine them under one file. Is this the correct thing to do? Below is the code:
import netCDF4
import numpy as np
import xarray
from glob import glob
ds = xarray.merge([xarray.open_dataset(f) for f in glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/17-4-21-0200-Band 1-16/*.nc')])
ds.to_netcdf('Combined band data at 0200.nc')
Then I opened the combined data file and the output looked like this:
So basically I combined metadata of Band 1-16 at 0200 time under file and did the same for other times
is it okay to do this?
is it okay to do this?
"ok" is relative. If you get the output you want then it is perfectly fine. I expected your original "all-in-one" code to work, but you got an error. I've asked above to see if @sfinkens has any ideas as he was the last person I know to look at some of this time dimension stuff. However, he is also probably not going to be online much today so we'll have to wait until he has some free time.
sure no worries
I tried to execute a different method by saving the metadata of each band seperately in each file, then I used the following code to combine them under one file. Is this the correct thing to do? Below is the code:
import netCDF4 import numpy as np import xarray from glob import glob ds = xarray.merge([xarray.open_dataset(f) for f in glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/17-4-21-0200-Band 1-16/*.nc')]) ds.to_netcdf('Combined band data at 0200.nc')
Then I opened the combined data file and the output looked like this:
so for the alternative code I mentioned above, each of the band file was basically concatenated. But will the data be affected if I use this method? As in will I be able to retain the data when I concatenate?
so for the alternative code I mentioned above, each of the band file was basically concatenated. But will the data be affected if I use this method? As in will I be able to retain the data when I concatenate?
I'm not sure. I've never done this before. It looks OK though.
alright, no worries, thank you very much for assisting me so far! so I think now we will wait for @sfinkens
If I understand correctly, you want the netCDF variable to have dimensions (time, y, x)
, right? To achieve this you need to expand the dataset dimensions as follows:
for band in mybands:
scn[band] = scn[band].expand_dims(time=[scn[band].attrs['start_time']])
scn.save_datasets(...)
Note that the time
argument in expand_dims
is a list. The CF writer should probably print a more informative error message if the dataset has a "free" time coordinate.
I also see that you try to save a timeseries to netCDF. You could do that using MultiScene by creating a timeseries and saving that to netcdf. However, the second setep currently fails due to a bug in the CF writer (https://github.com/pytroll/satpy/issues/1242; there is also a code example how it's supposed to work). I think @zxdawn wanted to look into this.
If I understand correctly, as suggested by @sfinkens, MultiScene
is the best solution for time series data.
But, the time_bnds
issue is a headache ... I will try to fix it soon.
@willows-1 Could you show your expected results here? Then, I can try to make a PR to meet it and #1242.
hi zxdawn and sfinkens, thanks for your reply! @zxdawn, you want my expected results using time_bnds?
If I understand correctly, you want the netCDF variable to have dimensions
(time, y, x)
, right? To achieve this you need to expand the dataset dimensions as follows:for band in mybands: scn[band] = scn[band].expand_dims(time=[scn[band].attrs['start_time']]) scn.save_datasets(...)
Note that the
time
argument inexpand_dims
is a list. The CF writer should probably print a more informative error message if the dataset has a "free" time coordinate.I also see that you try to save a timeseries to netCDF. You could do that using MultiScene by creating a timeseries and saving that to netcdf. However, the second setep currently fails due to a bug in the CF writer (#1242; there is also a code example how it's supposed to work). I think @zxdawn wanted to look into this.
yes thats right because I want start_time and end_time
I tried to execute a different method by saving the metadata of each band seperately in each file, then I used the following code to combine them under one file. Is this the correct thing to do? Below is the code:
import netCDF4 import numpy as np import xarray from glob import glob ds = xarray.merge([xarray.open_dataset(f) for f in glob('C:/Users/binis/OneDrive/Desktop/To_Binish/ftp_h8_hsd_2pm/17-4-21-0200-Band 1-16/*.nc')]) ds.to_netcdf('Combined band data at 0200.nc')
Then I opened the combined data file and the output looked like this:
so for the alternative code I mentioned above, each of the band file was basically concatenated. But will the data be affected if I use this method? As in will I be able to retain the data when I concatenate?
@zxdawn @sfinkens what about using this method? Is it the correct method to combine the netcdf file of each band by concatenating? And will the data be affected if I use this method? As in will I be able to retain the data when I concatenate?
Hi, I would like to know how I can extract scene metadata (such as start and end time, longitude, latitude, etc) from satellite image? Appreciate all the help I can get!