radiantearth / stac-browser

A full-fledged UI in Vue for browsing and searching static STAC catalogs and STAC APIs
https://radiantearth.github.io/stac-browser
ISC License
276 stars 138 forks source link

Validation fails for URIs without a host (e.g. file:///...) #415

Closed chiarch84 closed 7 months ago

chiarch84 commented 7 months ago

I put here an example of STAC item which is correctly validated by STACLint but that returns an error in STAC Browser validation.

I believe the problem is the URI path to the file, but since all of our items are paths to files, all of them result in not validated.

{
    "type": "Feature",
    "stac_version": "1.0.0",
    "stac_extensions": [],
    "id": "AdministrativeUnits.GISCO.EuroGlobalMap.V6.item.SQLITE_EGM_V60",
    "collection": "AdministrativeUnits.GISCO.EuroGlobalMap.V6",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -73.0,
                    83.7
                ],
                [
                    47.0,
                    83.7
                ],
                [
                    47.0,
                    27.6
                ],
                [
                    -73.0,
                    27.6
                ],
                [
                    -73.0,
                    83.7
                ]
            ]
        ]
    },
    "bbox": [
        -73.0,
        27.6,
        47.0,
        83.7
    ],
    "properties": {
        "title": "EGM_V60.sqlite",
        "description": "EuroGlobalMap is a topographic dataset that covers the EU, Andorra, Croatia, Faroe Islands,  Georgia, Greenland, Iceland, Kosovo, Liechtenstein, Moldova, Monaco, Norway, San Marino, Serbia, Switzerland, Ukraine and Vatican at the scale 1:1 Million. It is produced in cooperation by the National Mapping Agencies of Europe, using official national databases. Thematic layers: administrative boundaries, hydrography, transportation. The data can be accessed via the EC Restricted Download Link using the Commission internet user name and password. This data is protected by copyright and cannot be made publicly available.",
        "start_datetime": "2008-09-01T00:00:00.000000Z",
        "end_datetime": "2012-12-13T23:59:59.000000Z",
        "datetime": null,
        "proj:epsg": "7019"
    },
    "links": [
        {
            "rel": "self",
            "type": "application/geo+json",
            "href": "https://jeodpp.jrc.ec.europa.eu/eu/data/stac-api/collections/AdministrativeUnits.GISCO.EuroGlobalMap.V6/items/AdministrativeUnits.GISCO.EuroGlobalMap.V6.item.SQLITE_EGM_V60"
        },
        {
            "rel": "parent",
            "type": "application/json",
            "href": "https://jeodpp.jrc.ec.europa.eu/eu/data/stac-api/collections/AdministrativeUnits.GISCO.EuroGlobalMap.V6"
        },
        {
            "rel": "collection",
            "type": "application/json",
            "href": "https://jeodpp.jrc.ec.europa.eu/eu/data/stac-api/collections/AdministrativeUnits.GISCO.EuroGlobalMap.V6"
        },
        {
            "rel": "root",
            "type": "application/json",
            "href": "https://jeodpp.jrc.ec.europa.eu/eu/data/stac-api/"
        }
    ],
    "assets": {
        "data": {
            "href": "file:///data/base/AdministrativeUnits/EUROPE/GISCO/EuroGlobalMap/VER6-0/Data/Spatialite/EGM_V60.sqlite",
            "type": "application/geopackage+sqlite3",
            "title": "Dataset",
            "roles": [
                "data"
            ]
        }
    }
}

Error returned by STAC Browser: image

m-mohr commented 7 months ago

It comes from the asset href, indeed.

The schema defines asset hrefs must be IRI-References. STAC Browser validates this, staclint doesn't validate the href afaik.

It looks like the three /// are an issue:

Without further investigation I'm not sure whether three or two slashes are to be preferred and what the standard about IRIs says (which is https://datatracker.ietf.org/doc/html/rfc3987#section-2.2). I also don't know what the implications are on the different operating systems.

Maybe it's an upstream issue in https://github.com/luzlab/ajv-formats-draft2019 (or stac-node-validator) which provides the validaton for iri-references. It's certainly not something I could fix in STAC Browsder though.

chiarch84 commented 7 months ago

Thanks for your answer. Concerning the slashes we need 3 as the first 2 are part of the protocol file:// while the third one concerns the effective path. I will try to file the issue to the stac-node-validator maybe the problem is there as you suggest. Thanks for your help.

m-mohr commented 7 months ago

Well, I also maintain stac-node-validator. ;-) The question we need to answer first, whether such a file URI is a valid iri-reference according to the specification, i.e. whether the host can be left out or whether there's something that should be used as the host (e.g. 127.0.0.1 or localhost or whatever)... Help would be apprecaited as I don't have much time for investigations right now. Only when we have a good answer there, I could work on a solution.

chiarch84 commented 7 months ago

Here the wikipedia definition of valid URI (with related references to the related standards)

"A file URI has the format file://host/path where host is the fully qualified domain name of the system on which the path is accessible, and path is a hierarchical directory path of the form directory/directory/.../name. If host is omitted, it is taken to be "localhost", the machine from which the URL is being interpreted. Note that when omitting host, the slash is not omitted (while "file:///piro.txt" is valid, "file://simpen.txt" is not, although some interpreters manage to handle the latter). A valid file URI must therefore begin with either file:/path (no hostname), file:///path (empty hostname), or file://hostname/path."

So in fact all of our URI follow the pattern file:///path (empty hostname) where hostname is omitted as it is meant to be localhost. In fact the URI is the path to the distributed file system laying below our data services, and so using exactly that URI in the code, the final file can be reached and opened without adding any prefix or suffix to the URI.

m-mohr commented 7 months ago

Thanks, good pointer. I've fixed the issue in stac-node-validator and this should now work in STAC Browser, too.

chiarch84 commented 7 months ago

Thank you very much!!!