pytroll / trollflow2

Next generation Trollflow. Trollflow is for batch-processing satellite data using Satpy
https://trollflow2.readthedocs.org/
GNU General Public License v3.0
10 stars 15 forks source link

save_dataset plugin mixes arguments for itself and for the writer #145

Closed gerritholl closed 2 years ago

gerritholl commented 2 years ago

Describe the bug

The trollflow2 plugin save_dataset can read arguments either directly from product_list or, if arguments vary between writers, from the formats-dictionary. If we want to use different values of staging_zone, output_dir, fname_pattern for different writers, then we must configure them in the formats-dictionary. In this case, trollflow2 passes on those arguments to the satpy writer. Although some writers (ninjotiff, geotiff) swallow any unknown arguments, others (netcdf, ninjogeotiff) are stricter and will fail on unknown arguments.

To Reproduce

from satpy import Scene
from trollflow2.plugins import save_datasets
from glob import glob
from queue import Queue
seviri_files = glob("/media/nas/x21308/scratch/SEVIRI/202103300900/H-000*")
sc = Scene(
        filenames=seviri_files,
        reader=["seviri_l1b_hrit"])
sc.load(["IR_108"])
ls = sc.resample("eurol")

product_list = {
    "fname_pattern": "name.tif",
    "use_tmp_file": True,
    "staging_zone": "/tmp/a",
    "areas": {
        "eurol": {
            "products": {
                "IR_108": {
                    "productname": "IR108",
                    "formats": [
                        {"writer": "ninjogeotiff",
                         "ChannelID": 0,
                         "DataType": 0,
                         "PhysicUnit": "no",
                         "PhysicValue": "yes",
                         "SatelliteNameID": 0,
                         "output_dir": "/tmp/b",
                         }
                    ]
                }
            }
        }
    }
}

job = {}
job['resampled_scenes'] = {"eurol": ls}
job["input_mda"] = {}
job["product_list"] = {"product_list": product_list}
job["produced_files"] = Queue()
save_datasets(job)

Expected behavior

Following "explicit is better than implimint", it would be better to have an interface where the configuration file clearly marks with arguments are for the satpy writer (ChannelID, DataType, PhysicUnit, PhysicValue, and SatelliteNameID), and which ones are for trollflow2 consumption only (output_dir). Failing that, any arguments not expected by the writer should be removed before the writer is called.

Actual results

The code snippet fails, because trollflow2 passes output_dir to the writer, but the writer doesn't know what to do with this:

Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/mwe/trollflow2-save-dataset-args.py", line 42, in <module>
    save_datasets(job)
  File "/data/gholl/checkouts/trollflow2/trollflow2/plugins/__init__.py", line 284, in save_datasets
    obj = save_dataset(scns, fmat, fmat_config, renames, compute=eager_writing)
  File "/data/gholl/checkouts/trollflow2/trollflow2/plugins/__init__.py", line 232, in save_dataset
    obj = scns[fmat['area']].save_dataset(dsid,
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1099, in save_dataset
    return writer.save_dataset(self[dataset_id],
  File "/data/gholl/checkouts/satpy/satpy/writers/__init__.py", line 814, in save_dataset
    return self.save_image(img, filename=filename, compute=compute, fill_value=fill_value, **kwargs)
  File "/data/gholl/checkouts/satpy/satpy/writers/ninjogeotiff.py", line 171, in save_image
    ntg = NinJoTagGenerator(
  File "/data/gholl/checkouts/satpy/satpy/writers/ninjogeotiff.py", line 318, in __init__
    raise ValueError("The following tags were not recognised: " +
ValueError: The following tags were not recognised: output_dir

Environment Info:

Additional context

This could be fixed by adding more lines to:

https://github.com/pytroll/trollflow2/blob/dbd4e5db1ebab5177e614d67fbc17b3c761a0a01/trollflow2/plugins/__init__.py#L219-L220

which works only as long as we can assume that no satpy writer uses any of those parameters.

A prettier solution would be to explicitly separate arguments for the plugin from arguments for the writer.