ioos / catalog

IOOS Catalog general repo for documentation and issues
https://ioos.github.io/catalog/
MIT License
2 stars 6 forks source link

Track pycsw and CKAN/Solr search integration progress #48

Closed mwengren closed 4 years ago

mwengren commented 7 years ago

Keep issue open to track if any plans/progress exists to unify search/filter functionality in pycsw when deployed with CKAN.

CKAN uses a Solr index for search (potentially to be abstracted in the future). pycsw uses its own search implementation.

Can we get the same, or more similar, search results from pycsw service that currently come out of CKAN/Solr?

cc: @tomkralidis @kwilcox

tomkralidis commented 7 years ago

cc @kalxas / @amercader / @jjediny / @rsignell-usgs

@mwengren thanks for tracking. Note that pycsw (master) now has the ability to allow for custom repository plugins to support any underlying backend (Elasticsearch, SOLR, etc.). The basic idea here is that the repository plugin parses the CSW filter into its own native filter syntax, performs the necessary query, and then returns a list of records as pycsw Record objects.

So for CKAN/pycsw this would mean writing a custom pycsw CKAN plugin which looks something like this (as far as core methods) but interacts with SOLR directly. The plugin would be part of / installed with ckanext-spatial and be referenced in pycsw configuration like ckanext.spatial.lib.pycsw_plugin. The meat/core of the work would be translating an OGC filter into SOLR query syntax (the rest is very straightforward).

The key benefits here are:

Happy to further discuss here or pycsw IRC or Gitter.

mwengren commented 7 years ago

@tomkralidis thanks, that seems worth pursuing. If it avoids having to do the database synchronization routine and since in this case it would only be providing a search function (not insert, update, etc), hopefully that would simplify things slightly.

We'll keep this in mind for future work. Unless of course someone out there has already started something like this and can share...

tomkralidis commented 7 years ago

cc @ricardogsilva

mwengren commented 7 years ago

Related to: https://github.com/geopython/pyfes. Track/contribute to development of pyfes filter encoding module for Python.

benjwadams commented 5 years ago

This would be fairly trivial to implement with PostgreSQL's foreign data wrappers if there were one offered for Solr, as is the case with ElasticSearch.

mwengren commented 5 years ago

@benjwadams We can talk about this at next week's meeting, but since it's been ~ 2 years since the original discussion, we should probably reach back out to @tomkralidis or others to see whether something like this has already been implemented or if there's another way to approach it.

tomkralidis commented 5 years ago

cc @kalxas @capooti

I'm not sure for SOLR but for Elasticsearch @mikejmets has implemented the functionality for a specific project which we should integrate into pycsw core. We will also be discussing this at the OSGeo Code Sprint and potentially implementing it that week with enough interest/energy.

capooti commented 5 years ago

@tomkralidis sounds great

mikejmets commented 5 years ago

Here is the branch https://github.com/SAEONData/pycsw/tree/elastic. It contains a plugin that interfaces with the cherrypy ElasticSearch wrapper https://github.com/SAEONData/elastic-search-agent. Still hoping to get funding to refactor this plugin to talk directly to elasticsearch-dsl instead of the wrapper. @bryan-mcalister-saeon has created the docker deployment files here https://github.com/SAEONData/deploy-metadata-services. Note that no pyCSW harvesters exist so the said wrapper must be used to populate the catalog.