pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

Satpy find_files_and_readers cannot find matched file names #912

Open anikfal opened 5 years ago

anikfal commented 5 years ago

No matter what values are set for start_time and end_time; all of the files with the same reader in base_dir are selected by find_files_and_readers.

from` satpy.scene import Scene
from satpy import find_files_and_readers
from datetime import datetime
files = find_files_and_readers(base_dir='/home/ah/MSG_data/netcdf_files/',
                               start_time=datetime(2019,4,1,9,00),
                               end_time=datetime(2019,4,1,9,15),
                               reader='seviri_l1b_nc')
scn = Scene(filenames=files)
composite = 'night_fog'
scn.load([composite])
scn.show(composite)

Expected behavior According to the above mentioned code, only the first and second files of the list below should be selected by find_files_and_readers, but they are all selected.

Environment Info:

djhoese commented 5 years ago

I'm not super familiar with the reader, but based on the file pattern configured in Satpy the date in those filenames is the processing time (when the file was created) and not necessarily the start/end time of the observation. If you create the Scene object and do print(scn.attrs['start_time']), what do you get? Same for end_time?

anikfal commented 5 years ago

Thanks for your consideration. I downloaded the files just recently and I think for the MSG SEVIRI data, the file names represent the observation time. Anyway, the outputs of print(scn.attrs['start_time']) and print(scn.attrs['end_time']) are as follows:

djhoese commented 5 years ago

the file names represent the observation time.

Could be, but that's not how Satpy is interpreting it. @sjoro or @ColinDuff may know more.

I don't have a good explanation why it isn't filtering the filenames. Let's see if @mraspaud has any ideas of something we're missing (he wrote most of that filtering code iirc).

mraspaud commented 5 years ago

@anikfal when you use find_files_and_readers with the start_time and end_time parameters, it will try to match the time from the filenames, which in this case aren't any good (as according to the reader the time from the filename is the processing time (@ColinDuff ?)). If you want to go inside the file and match those times, you should use the filter parameters in the scene instantiation, like this:

scn = Scene(filenames=files, filter_parameters={'start_time': datetime(2019,4,1,9,00), 'end_time': datetime(2019,4,1,9,15)})

However, beware that this will open every one of the files, so if there are many of them, this could take a while.

ColinDuff commented 5 years ago

hi, the start and end time attributes are taken from the netcdf file attributes , not the filename.

The time in the filename is the RepeatCycle time

The end_time attribute , if that is what the filter is using may not be appropriate here.

i can have a look

anikfal commented 5 years ago

@mraspaud filter parameter works for this case. However, as you mentioned, it opens each of the files and is not an efficient way when it comes to a large number of files. Nevertheless, it could be handled by a simple shell script to link the desired files to a separate path, and then reading them by find_files_and_readers. Furthermore, I am quite sure that the filenames for this case (MSG SEVIRI data) are the observation times, not the processing times.