michalporeba / odis

Search in decentralised systems. Search federation, result moderation, aggregation and feedback with hypermedia in ReSTful API to round it all of.
MIT License
9 stars 2 forks source link

Federated Search - Is there a protocol I can use? #4

Closed michalporeba closed 2 years ago

michalporeba commented 2 years ago

Question

Background

Federated search has been studied for the last two decades. Many solutions have been developed, and perhaps there is no need to design something from scratch. However, the HATEOAS elements is important, so it is likely I will have to extend an existing standard.

Open standards are important for this project to follow the guidance for UK Civil Service on Open Standards.

Standards to consider

Open Search - originally developed by Amazon, appears to be the standard. It is used by Microsoft in Windows and SharePoint search. But the standard is based on RSS (XML) and hasn't been updated since 2005.

Search/Retrieve via URL (SRU) another XML based search standard promoted by the Library of Congress. It uses Contextual Query Language (CQL). The latest version was published at the beginning of 2013. It was created to replace the Z39.50.

Schema.org vocabulary can be used with many different standards.

Existing projects

Open Federated Search

References

michalporeba commented 2 years ago

SRW/U - Review

Search/Retrieve Web Service (SRW) is an extension o the Search/Retrieve via URL (SRU) protocol. It is a SOAP protocol relying on Contextual Query Language (CQL).

There was a number of implementation in 2000s but even the documentation doesn't appear to be very well maintained. The standard allows for querying the data, especially lexical data. It is based on XML and associated technology. It does not support hypermedia.

Interestingly, it allows the searcher to explicitly specify the servers on which the query is to be executed which implies the knowledge of where the information might be. It can be extended to support linked data.

References

michalporeba commented 2 years ago

Open Search - Review

There is now also a newer OpenSearch branded fork of Elasticsearch. This is not about it.

The standard announced in 2005 had a better run than SRW/U. It was supported by Amazon (it was announced by Jeff Bezos) and many web browsers. It was based on RSS, initially returning results in RSS (version 1.0) then changed to OpenSearch Response in 1.1)

It is used in SharePoint, in Windows 7 and newer search functionality, and Bing. It supports authentication.

OpenSearch consists of a number of XML based standards. It includes Search Aggregators and an option for Auto Discovery. It has a way of describing a search engine in an XML format, so it can be used by a generic search interface. This is the technology that allowed browsers to search multiple search engines from a single user interface, giving the user choice of which one to use.

The standard hasn't changed in over a decade and is unlikely to change.

References

michalporeba commented 2 years ago

Data Mesh - is it relevant?

Data Mesh is a fairly recent term coined and promoted by ThoughtWorks and Martin Fawler. Other terms have been used for similar ideas: Data Fabric, Data as a Product

Currently, there doesn't appear to be any standard behind these concepts. A ReSTful API allowing feedback or two-way communication and information exchange could be useful.

Briefly, the idea is to give up on data centralisation, an approach that failed to solve the ever-growing data problem and, instead, follow the trends now mainstream in wider software architecture. Decentralisation, user and product focus.

If we build systems, where data is decentralised, stored, processed and indexed within services that produce it, there will need to be a way to search them. The search has to be equally decentralised and be able to work in the mesh topology.

The data and associated metadata has to be discoverable and shared between services. Solving the search problems can get us closer to data mesh ideas.

michalporeba commented 2 years ago

Conclusions

There appears to be no modern standard which either is extensible enough or maintained enough to consider improvements, to implement what I expect will need to be implemented to succeed with my approach to federated search. OpenSearch is the closest, and one that is still used in a number of commercial products, and supported in some browsers (Mozilla dropped support in 2019).

The functionality missing from OpenSearch is mostly to do with feedback and context of the search, It might be possible to design a 'standard' for data exchange between the search nodes and services that want to embrace the new functionality. Externally the search functionality can be exposed by OpenSearch and similarly, through OpenSearch, external data sources supporting the standard could be queried. c4-protocols

The use of OpenSearch on the peripheries of the network would help with integration with existing systems while designing a new standard for internal communication would allow exploring the expected benefits of the approach freely.

The search node could over time become data node or mesh node allowing a reacher data collaboration between systems.