radiantearth / stac-browser

A full-fledged UI in Vue for browsing and searching static STAC catalogs and STAC APIs
https://radiantearth.github.io/stac-browser
ISC License
286 stars 143 forks source link

Further improvements for SEO (especially schema.org / GDS) #295

Open m-mohr opened 1 year ago

m-mohr commented 1 year ago

Google Search makes good use of STAC Browser: https://www.google.de/search?q=site:mspc.lutana.de

Google DatasetSearch also picks it up, but the data is not ideal yet: https://datasetsearch.research.google.com/search?src=0&query=Planet%20NICFI&docid=L2cvMTF0dDk0bGd6aw%3D%3D

Especially the schema.org data should be improved if possible.

cboettig commented 1 year ago

Thanks @m-mohr !

I'd be keen to see an alignment of Science on Schema guidelines from the Earth Science Information Partners Federation with the schema.org being generated from STAC JSON. (because it would be so nice if these two JSON metadata formats from two widely adopted earth science communities were more interoperable :blush: !)

m-mohr commented 1 year ago

@cboettig ~Is there an equivalent for STAC Collections in SOSO? I'm using a DataCatalog right now, but it seems not to be part of SOSO.~ Edit: Just found https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#collections-of-datasets-using-schemaorg-datacatalog - I'm currently struggling to get the DataCatalog be included in Google Dataset Search though...

I'll have a look at the Dataset guideline on what we can improve in the Browser. I don't think there's a good equivalent for Data Repository? Maybe the root catalog? What do you think is the best way to align SOSO and STAC?

One issue with Dataset is that it is likely what STAC Items are, but STAC Items are often not very descriptive (just have an id, but no title or description). Or should STAC Collections be Datasets?

cboettig commented 1 year ago

Thanks @m-mohr -- good questions! I should follow up with the ESIP devs, I'm mostly a data consumer working across products in both stac and ESIP and hoping to connect the dots!

I think there are analogous concepts for the collection / catalog / item levels of STAC but am not sure the best choices. My understanding is that schema.org/Dataset was based on the original W3C DCAT (Data Catalog) standard, now in it's 3rd version, which I think has all these notions. I know the ESIP folks know the W3C standards well and I think their style of schema.org roughly parallels that, but I'm not expert here.

@mbjones or others probably have good advice here.

m-mohr commented 1 year ago

Thanks. So right now I'm mapping: STAC Collection (or Catalog) -> DataCatalog STAC Item -> Dataset STAC Asset -> DataDownload

I'm not sure whether that's ideal though due to the limited information in a STAC Item. What we can find now in GDS is just Datasets with limited information, but no DataCatalogs, which have much more information.

Any insights would be appreciated.

cboettig commented 1 year ago

This seems reasonable to me at least. I'm also interested in the stac extensions or at least those extensions that have good parallels to science-on-schema (e.g. scientific citation, file info, table).

In some ways mapping such extensions to schema.org is particularly compelling where there are schema.org based dataset browsing tools that can already take advantage of indexing on such fields as "author" or "column name" that are not as first-class in stac search....

cboettig commented 1 year ago

@m-mohr regarding your comment:

I'm currently struggling to get the DataCatalog be included in Google Dataset Search though...

yeah, I noticed that too. I got some great advice from @mbjones on possible culprits for this:

Also, while I think the mapping you have

STAC Collection (or Catalog) -> DataCatalog STAC Item -> Dataset STAC Asset -> DataDownload

Makes sense from a literal/technical standpoint, it does look like a lot of metadata fields often found on a Dataset item in ESIP wind up only being on the DataCatalog for a stac entry. e.g. using google rich results test:

https://search.google.com/test/rich-results?url=https%3A%2F%2Fradiantearth.github.io%2Fstac-browser%2F%23%2Fexternal%2Fplanetarycomputer.microsoft.com%2Fapi%2Fstac%2Fv1%2Fcollections%2Fmobi%3F.language%3Den

which will wind up with lots of useful stuff being missed (e.g. spatial coverage, temporal coverage, creator, licence, copyrightHolder, producer, provider, keywords, etc would I think all be cut off from the Dataset search since they aren't properties of the Dataset). Not sure if there's a good way to handle 'inheritance' in this context?

Down the road, it would be really nice if some of the common extensions could also be translated into schema.org. e.g. I think there's a really clean/simple mapping for the scientific citation extension and the table extension into schema.org / ESIP science-on-schema conventions which I'd love to see included. Please let me know if I should open a separate issue for that. Our community may be able to contribute a PR if interested (and I can find who knows javascript well...)

m-mohr commented 1 year ago

@cboettig Your comments are appreciated, thanks! I don't have the time right now to work on it, but I'll get back to it eventually.

cboettig commented 1 year ago

Thanks for the heads up and no worries! Appreciate all the amazing work you're doing here.