Closed dinosk closed 6 years ago
Search REST API for Invenio.
Invenio-Records-REST is a core component of Invenio which provides configurable
REST APIs for searching, retrieving, creating, modifying and deleting records.
The module uses Elasticsearch as the backend search engine and builds on top
a REST API that supports features such as:
- Search with sorting, filtering, aggregations and pagination which allows the
full capabilities of Elasticsearch to be used such as e.g. geo-based quering.
- Record serialization and transformation (e.g. JSON, XML, DataCite, DublinCore,
JSON-LD, MARCXML, Citation Style Language).
- Pluggable query parser.
- Tombstones and record redirection.
- Customizable access control.
- Configurable record namespaces for exposing different classes of records (e.g.
authors, publications, grants, ...).
- CRUD operations for records with support for persistent identifier minting.
- Super-fast completion suggesters for implementing Google like instant
autocomplete suggestions.
The REST API follows best practices and supports e.g.:
- Content negotiation and links headers.
- Cache control via ETags and last modified headers.
- Optimistic concurrency control via ETags.
- Rate-limiting, Cross-Origin Resource Sharing, and various security headers.
The Search REST API works as **the** central entry point in Invenio for
accessing records. The REST API in combination which e.g. Invenio-Search-UI/JS
allows to easily display records anywhere in Invenio and still only maintain one
single search endpoint.
Basics
------
Records
~~~~~~~
- JSON
- JSONSchema
Elasticsearch
~~~~~~~~~~~~~
- Index
- Mapping
- evolution: Alias
Record namespace
Searching is highly impacted by the mapping used.
(doesn't talk about mapping!)
Initialization
--------------
Configuration
-------------
Record namespace
- persistent identifier
- es index alias
- minters
- route specification: pid_value
Exposing multiple namespaces
>>> app.config['RECORDS_REST_ENDPOINTS']['recid']
Installing endpoints
--------------------
Demo data
~~~~~~~~~
Searching
---------
Basic search - query parser, filters, aggregations, sorting, pagination
>> /records/?q=...&size=10&page=1&sort=bestmatch&type=test
Facets: filtering + aggregations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- facets = filters + aggregations
RECORDS_REST_FACETS
>>> /records/?type=test&type=anoterhvalye
{'aggs':}
Aggregations
++++++++++++
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations.html
bucketing, metric, matrix, pipline
RECORDS_REST_FACETS[:dfd]
Filters
+++++++
- pre/post filters
- many other t
Sorting
~~~~~~~
RECORDS_REST_SORT_OPTIONS
RECORDS_REST_DEFAULT_SORT
Simple, multi-field
/records/?sort=-mostrecent
Ref: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-sort.html
- order: asc/desc
- mode: min, max, avg, sum
- nested
- missing values
- geo distance
- script based sorting
Query parser
~~~~~~~~~~~~
/records/?q=....
RECORDS_REST_ENDPOINTS['search_factory']
By default the easy query string query parser is used. Features:
- field names + operators
- exists/missing
- ranges
- wildcards, regular expressions
- fuziness
- proximity search
- boosting
Legacy Invenio query parser
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
- Replacing the query parser.
Suggesters
~~~~~~~~~~
/records/_suggest?text=...
- Config
- Make sure data is properly indexed.
suggesters[my_url_param_to_complete]
Advanced customization
~~~~~~~~~~~~~~~~~~~~~~
Replace the search factory to do very advanced querying - i.e. exact control over
what is sent to elasticsearch.
Max results
Error handlers
Record class Fetch record from ES instead of database.
Serialization
-------------
Key feature is transforming records from JSON to other formats. E.g. JSON-JSON,
removing sensitive information, enriching, stable output format, or standardized formats like
DataCite, DublinCore,
Content negotiation
~~~~~~~~~~~~~~~~~~~
- Mimetype -> Serializer
- Versioning via mimetypes
Workflow
~~~~~~~~
Preprocess into same format (database vs ES)
Transform record
Serialize data format
Data formats
~~~~~~~~~~~~~~~~~~~~
- JSON: JSON-LD, CSL
- XML: DataCite, DublinCore, MARCXML
- Text: Citation formatting
Transformations
~~~~~~~~~~~~~~~
- JSON-JSON: Marshmallow
Citation formatting
~~~~~~~~~~~~~~~~~~~
Example:
transform to
Tombstones and redirection
--------------------------
error handlers
Access control
--------------
Principles
- search filtering
- factories
impact of es query parser and hidden fields
Factories
~~~~~~~~~
Deserialization
---------------
Create, Update, Delete support
- Loaders
- Minters/Fetchers
Add to
invenio_records_rest/__init__.py
docstring.