pytroll / pygac

A python package to read and calibrate NOAA and Metop AVHRR GAC and LAC data
https://pygac.readthedocs.org/
GNU General Public License v3.0
20 stars 25 forks source link

Cannot read LAC files from NOAA CLASS #130

Closed vysitu closed 2 weeks ago

vysitu commented 2 weeks ago

Describe the bug I requested some AVHRR L1B data from NOAA CLASS. The data were NOAA10 and NOAA11, in August 1989.

This might look like a previous issue in the satpy github repo: https://github.com/pytroll/satpy/issues/2494 But I made sure it's 10bit/pixel when I requested the data. Could you please take a look at the following information? Thank you!

To Reproduce Not sure if it's the best way to do, but one of the files have been uploaded to Google Drive: https://drive.google.com/file/d/1JM0OCf9L7-Ps3z-iMzfB_1qLvt5Sm6Or/view?usp=drive_link

I did the following:

  1. Install satpy: conda install -c conda-forge satpy
  2. Install extra package: (in the conda env) pip install "satpy[avhrr_l1b_gaclac]"
  3. Get the Two Line Element (TLE) file: the contents were from here: https://celestrak.org/NORAD/elements/gp.php?GROUP=noaa&FORMAT=tle I copied the whole thing into a .txt file called "NOAA.txt".
  4. try to read the data:
import os
from satpy import Scene
from satpy.utils import debug_on

fpath = 'E:/data/avhrr_L1B/19890801-02_10bit/NSS.LHRR.NH.D89214.S1844.E1851.B0440404.WI'
os.path.exists(fpath)

reader_kwargs = {'tle_dir': '.', 'tle_name': 'NOAA.txt'}
ancillary = ['solar_zenith_angle',
             'sensor_zenith_angle',
             'latitude',
             'longitude']

scn = Scene(reader="avhrr_l1b_gaclac", filenames=[fpath], reader_kwargs = reader_kwargs)

Expected behavior Tried the files available in the satpy github repo: https://github.com/pytroll/satpy/issues/2494
saw some warnings but the scene was load-able. I was able to export the loaded scene to a Tiff file.

My thoughts on the cause of the error The error message is a bit long, so I'm putting my thoughts here.
Looking into the files mentioned in the error message, I think it is caused by the pod-reader.py not setting the position of the file pointer right, therefore getting the "89" as spacecraft ID. I requested data from 1983-1989, none worked. 16bit/pixel data didn't work, either. I also tried selecting the "scanline" box and the "signature" box, neither seemed to make a difference. I haven't tried data after 1992 since I don't really need them at this point.
I first thought it was caused by the get_start_date function, but in the second run of the reading part of the script, the first error line about the number of scanlines went away. So the "auto" mode for header date seemed working properly...

Actual results got the following error:

D:\APP\conda\envs\basic\Lib\site-packages\pygac\pod_reader.py:291: RuntimeWarning: Unexpected number of scanlines!
  self._read_scanlines(buffer, count)
Failed to load DataID(name='latitude', resolution=1050, modifiers=()) from <GACLACFile: 'E:/data/avhrr_L1B/19890801-02_10bit/NSS.LHRR.NH.D89214.S1844.E1851.B0440404.WI'>
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 712, in _load_dataset
    projectable = fh.get_dataset(dsid, ds_info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 152, in get_dataset
    self.read_raw_data()
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 146, in read_raw_data
    self.reader.read(self.filename)
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\pod_reader.py", line 299, in read
    self.spacecraft_name = self.spacecraft_names[self.spacecraft_id]
                           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 89
Could not load dataset 'DataID(name='latitude', resolution=1050, modifiers=())': "Could not load DataID(name='latitude', resolution=1050, modifiers=()) from any provided files"
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 836, in _load_dataset_with_area
    ds = self._load_dataset_data(file_handlers, dsid, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 736, in _load_dataset_data
    proj = self._load_dataset(dsid, ds_info, file_handlers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 721, in _load_dataset
    raise KeyError(
KeyError: "Could not load DataID(name='latitude', resolution=1050, modifiers=()) from any provided files"
D:\APP\conda\envs\basic\Lib\site-packages\numpy\core\fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
D:\APP\conda\envs\basic\Lib\site-packages\numpy\core\_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Could not load dataset 'DataID(name='longitude', resolution=1050, modifiers=())': max() iterable argument is empty
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 836, in _load_dataset_with_area
    ds = self._load_dataset_data(file_handlers, dsid, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 736, in _load_dataset_data
    proj = self._load_dataset(dsid, ds_info, file_handlers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 712, in _load_dataset
    projectable = fh.get_dataset(dsid, ds_info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 158, in get_dataset
    data, _ = self.reader.get_lonlat()
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 585, in get_lonlat
    self.update_meta_data()
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 513, in update_meta_data
    self.get_sun_earth_distance_correction())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 505, in get_sun_earth_distance_correction
    self.get_times()
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 451, in get_times
    year, jday, msec = self.correct_times_median(year=year, jday=jday,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 867, in correct_times_median
    jday = np.where(if_wrong_jday < 0, max(jday), jday)
                                       ^^^^^^^^^
ValueError: max() iterable argument is empty
Could not load dataset 'DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=())': max() iterable argument is empty
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 836, in _load_dataset_with_area
    ds = self._load_dataset_data(file_handlers, dsid, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 736, in _load_dataset_data
    proj = self._load_dataset(dsid, ds_info, file_handlers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 712, in _load_dataset
    projectable = fh.get_dataset(dsid, ds_info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 180, in get_dataset
    data = self._get_channel(key)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 279, in _get_channel
    self.calib_channels = self.reader.get_calibrated_channels()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 525, in get_calibrated_channels
    self.get_times()
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 451, in get_times
    year, jday, msec = self.correct_times_median(year=year, jday=jday,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 867, in correct_times_median
    jday = np.where(if_wrong_jday < 0, max(jday), jday)
                                       ^^^^^^^^^
ValueError: max() iterable argument is empty
The following datasets were not created and may require resampling to be generated: DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=())

Environment Info:

avhrr_l1b_aapp:  ok
avhrr_l1b_eps:  ok
avhrr_l1b_gaclac:  ok
avhrr_l1b_hrpt:  ok
avhrr_l1c_eum_gac_fdr_nc:  ok
mraspaud commented 2 weeks ago

@vysitu thanks for reaching out regarding this issue! From what I understand, the data you are using is from 1989, and the TLE you got is from now, so there is a mismatch here: the TLE data cannot be relied on for more than a week at the most, so pygac doesn't find any suitable TLE.

I recommend you download the TLE for the full life of the satellites, they are available here: https://github.com/pytroll/pygac/wiki/Gapfilled-TLE-files-for-(A)VHRR-satellites With that pygac will get the most appropriate TLE data for you passes.

Tell us how it goes!

vysitu commented 2 weeks ago

@mraspaud Hi, thank you for the information! I downloaded the TLE_noaa11.txt in the link you provided. The first part of the error message became:

INFO:pygac.pod_reader:Reading NSS.LHRR.NH.D89214.S1844.E1851.B0440404.WI
WARNING:pygac.reader:Expected 29813 scan lines, but found 2369!
D:\APP\conda\envs\basic\Lib\site-packages\pygac\pod_reader.py:291: RuntimeWarning: Unexpected number of scanlines!
  self._read_scanlines(buffer, count)
WARNING:satpy.readers.yaml_reader:Failed to load DataID(name='latitude', resolution=1050, modifiers=()) from <GACLACFile: 'E:/data/avhrr_L1B/19890801-02_10bit/NSS.LHRR.NH.D89214.S1844.E1851.B0440404.WI'>
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 712, in _load_dataset
    projectable = fh.get_dataset(dsid, ds_info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 152, in get_dataset
    self.read_raw_data()
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 146, in read_raw_data
    self.reader.read(self.filename)
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\pod_reader.py", line 299, in read
    self.spacecraft_name = self.spacecraft_names[self.spacecraft_id]

The rest is the same. I don't think the correct TLE file helped much ... or it wasn't loaded correctly when I set reader_kwargs = {'tle_dir': '.', 'tle_name': 'TLE_noaa11.txt'} and put the txt file with the notebook file.

I forgot to mention that the error message popped up when I used

scn.load(['1'])

after

scn = Scene(reader="avhrr_l1b_gaclac", filenames=[fpath], reader_kwargs = reader_kwargs)

and I was using jupyter notebook.

It is worth noting that I also tried TLE_noaa7.txt on a file from 1983 (filename: NSS.HRPT.NC.D83178.S1954.E2006.B1036868.WI), and the error message is as the following:

D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py:208: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  res["acq_time"] = ("y", times)
D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py:208: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  res["acq_time"] = ("y", times)
Failed to load DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=()) from <GACLACFile: 'E:/data/hysplit_data/NSS.HRPT.NC.D83178.S1954.E2006.B1036868.WI'>
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 712, in _load_dataset
    projectable = fh.get_dataset(dsid, ds_info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 180, in get_dataset
    data = self._get_channel(key)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py", line 279, in _get_channel
    self.calib_channels = self.reader.get_calibrated_channels()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 527, in get_calibrated_channels
    self.update_meta_data()
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 520, in update_meta_data
    self.meta_data['calib_coeffs_version'] = self.calibration.version
                                             ^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\reader.py", line 167, in calibration
    calibration = Calibrator(
                  ^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\pygac\calibration.py", line 104, in __new__
    defaults = cls.default_coeffs[spacecraft]
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: None
Could not load dataset 'DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=())': "Could not load DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=()) from any provided files"
Traceback (most recent call last):
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 836, in _load_dataset_with_area
    ds = self._load_dataset_data(file_handlers, dsid, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 736, in _load_dataset_data
    proj = self._load_dataset(dsid, ds_info, file_handlers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\yaml_reader.py", line 721, in _load_dataset
    raise KeyError(
KeyError: "Could not load DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=()) from any provided files"
The following datasets were not created and may require resampling to be generated: DataID(name='1', wavelength=WavelengthRange(min=0.58, central=0.63, max=0.68, unit='µm'), resolution=1050, calibration=<1>, modifiers=())
mraspaud commented 2 weeks ago

@vysitu thanks for reporting back. I looked at this a bit and I think I tracked this down to a problem in pygac, that the dataset name in the tbm header could be ebcdic-encoded. Pygac wasn't able to decode it correctly, hence the header was wrongly discarded. The PR here should fix this. Do you have the possibility to test this?

mraspaud commented 2 weeks ago

Here is how it looks now for me with your file: lac

vysitu commented 2 weeks ago

@mraspaud Thank you very much!

I didn't know how to get the test branch, so I downloaded the files you modified (luckily there were only 2..). After replacing the local pygac files with the modified ones, I was able to load datasets from a Scene. There were some warning messages, but I don't think it's pygac's issue. Not sure if it's the best practice, but I was able to export each band (dataset) to a tiff file, so I could use longitude, latitude and corresponding pixel values.

The warning messages are as follows:

D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py:208: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  res["acq_time"] = ("y", times)
D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py:208: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  res["acq_time"] = ("y", times)
D:\APP\conda\envs\basic\Lib\site-packages\pygac\calibration.py:549: RuntimeWarning: invalid value encountered in log
  tsE = c2*nu_c / np.log(1.0 + nBB_num / Ne)
D:\APP\conda\envs\basic\Lib\site-packages\satpy\readers\avhrr_l1b_gaclac.py:208: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  res["acq_time"] = ("y", times)

There was a file from NOAA-10 ("NOAA-G") it could not read. The error message was "array of sample points is empty". Other files from NOAA-10 seems readable. I will do more tests before I can say what the problem is.

mraspaud commented 2 weeks ago

Nice that it's working! I'm interested in the faulty noaa 10 file. It's probably a file with some missing data or something like that, but it would be great to improve pygac to even support these. So if you can share it, it would be great.

vysitu commented 2 weeks ago

@mraspaud Thanks again!

I have uploaded two of the NOAA-10 files I noticed to have the issues to the "with_issues" folder: https://drive.google.com/drive/folders/1ZfAwveWBawVQW5t9dZ7KUbZtqb8akLj4?usp=drive_link

I am requesting some files from earlier NOAA satellites with AVHRR onboard. Is it ok to notify you if I have issues reading them?

mraspaud commented 2 weeks ago

@vysitu thanks for linking to the broken files. I found the problem with them (some of the first scan lines were missing, messing with the PRT gap detection), and address it in a larger refactoring pull request I have open. But if you want to try it, here is the relevant code change that should make it work. f48480f (#127)

vysitu commented 1 week ago

@mraspaud Thank you very much!

I tried adding noaa.py and replacing test_reader.py, as mentioned in the commit, but it didn't work... I also tried replacing the pygac folder in site-packages with the one in git repo, which only made things worse. The error message was still "array of sample points is empty". Maybe replacing the file was not enough.. Will this fix be available in conda or pypi some time?

I tried it on some files from 1983 (NOAA-C) and the issue with "spacecraft_id" showed up again. I uploaded them to the same GoogleDrive folder as above. Would you mind taking a look? Files I have that were from NOAA-E and NOAA-F were all readable.

mraspaud commented 1 week ago

@vysitu ok, i think installing pygac properly will help. So first, uninstall pygac completely with eg pip uninstall pygac, and then install it again from the main branch on github with pip install git+https://github.com/pytroll/pygac.git. That should give you a valid installation.