wmo-im / wis2downloader

The backend Python package for downloading real-time data from the WIS2 network.
Apache License 2.0
1 stars 1 forks source link

Potential incomplete logic in extract_file_name Method for Handling File Names #29

Open AissaGeek opened 2 weeks ago

AissaGeek commented 2 weeks ago

Description

In the extract_file_name method within the DownloadWorker class in the downloader/__init__.py file. The method is responsible for extracting the file name and extension from a download URL. However, there is an uncertainty about handling file names with a specific pattern.

Example

Here is an example of payload representing this issue

{
   "topic":"cache/a/wis2/us-noaa-synoptic/data/core/weather/surface-based-observations/synop",
   "payload":{
      "id":"585436b8-95fb-11ef-845c-e43d1a213544",
      "type":"Feature",
      "version":"v04",
      "geometry":{
         "type":"Point",
         "coordinates":[
            -101.04662,
            39.42746
         ]
      },
      "properties":{
         "data_id":"us-noaa-synoptic/data/core/weather/surface-based-observations/synop/WIGOS_0-840-0-KCBK_20241029T133500",
         "datetime":"2024-10-29T13:35:00Z",
         "pubtime":"2024-10-29T13:40:21Z",
         "integrity":{
            "method":"sha512",
            "value":"Rt3ZDweAXB6Kl6xYpLLf/DXZJU0X1SkWw+wdlh6Shb2orW96IO9I/kq09dKMc9zqsgQRS91iQC9rWTvdIIPFiQ=="
         },
         "content":{
            "encoding":"base64",
            "value":"QlVGUgAA9wQAABYAAAAAAAAAAAZuHgAH6AodDSMAAAALAAABgMGWx1AAAMoAADSAAAS0NCSwAAAAAAAAAAAAAAAP//paGhJYAAAAAAAAAAAAAAAAAAAAAP0U62Nivs0PDyVDWVGtTFuTnf///////////7Y/tWDJ////////////////////gAB//////////////////////////////////////8Aln////////////////////////A+j/v/PABr///////////////////////////////////////////////////////////////+ANzc3Nw==",
            "size":247
         },
         "wigos_station_identifier":"0-840-0-KCBK"
      },
      "links":[
         {
            "rel":"canonical",
            "type":"application/x-bufr",
            "href":"https://wis2.dwd.de/gc/24h/us-noaa-synoptic/3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500.bufr4",
            "length":247
         },
         {
            "rel":"via",
            "type":"text/html",
            "href":"https://oscar.wmo.int/surface/#/search/station/stationReportDetails/0-840-0-KCBK"
         }
      ]
   },
   "target":"surface-obs"
}

You attempt to extract filename from "href":"https://wis2.dwd.de/gc/24h/us-noaa-synoptic/3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500.bufr4". The extraction leads to a filename with value 3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500 Which I doubt to be the expected filename.

Suggested solution

In pretty much of all cases, filename can be extracted from "data_id":"us-noaa-synoptic/data/core/weather/surface-based-observations/synop/WIGOS_0-840-0-KCBK_20241029T133500".