inveniosoftware / invenio-records-rest

Invenio records REST API module.
https://invenio-records-rest.readthedocs.io
MIT License
4 stars 63 forks source link

docs: write usage section #164

Closed dinosk closed 6 years ago

dinosk commented 6 years ago

Add to invenio_records_rest/__init__.py docstring.

lnielsen commented 6 years ago
Search REST API for Invenio.

Invenio-Records-REST is a core component of Invenio which provides configurable
REST APIs for searching, retrieving, creating, modifying and deleting records.

The module uses Elasticsearch as the backend search engine and builds on top
a REST API that supports features such as:

- Search with sorting, filtering, aggregations and pagination which allows the
  full capabilities of Elasticsearch to be used such as e.g. geo-based quering.
- Record serialization and transformation (e.g. JSON, XML, DataCite, DublinCore,
  JSON-LD, MARCXML, Citation Style Language).
- Pluggable query parser.
- Tombstones and record redirection.
- Customizable access control.
- Configurable record namespaces for exposing different classes of records (e.g.
  authors, publications, grants, ...).
- CRUD operations for records with support for persistent identifier minting.
- Super-fast completion suggesters for implementing Google like instant
  autocomplete suggestions.

The REST API follows best practices and supports e.g.:

- Content negotiation and links headers.
- Cache control via ETags and last modified headers.
- Optimistic concurrency control via ETags.
- Rate-limiting, Cross-Origin Resource Sharing, and various security headers.

The Search REST API works as **the** central entry point in Invenio for
accessing records. The REST API in combination which e.g. Invenio-Search-UI/JS
allows to easily display records anywhere in Invenio and still only maintain one
single search endpoint.

Basics
------

Records
~~~~~~~
- JSON
- JSONSchema

Elasticsearch
~~~~~~~~~~~~~
- Index
- Mapping

- evolution: Alias

Record namespace

Searching is highly impacted by the mapping used.
(doesn't talk about mapping!)

Initialization
--------------

Configuration
-------------
Record namespace
- persistent identifier
- es index alias
- minters
- route specification: pid_value

Exposing multiple namespaces
>>> app.config['RECORDS_REST_ENDPOINTS']['recid']

Installing endpoints
--------------------

Demo data
~~~~~~~~~

Searching
---------
Basic search - query parser, filters, aggregations, sorting, pagination

>> /records/?q=...&size=10&page=1&sort=bestmatch&type=test

Facets: filtering + aggregations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- facets = filters + aggregations

RECORDS_REST_FACETS

>>> /records/?type=test&type=anoterhvalye
{'aggs':}

Aggregations
++++++++++++
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations.html
bucketing, metric, matrix, pipline

RECORDS_REST_FACETS[:dfd]

Filters
+++++++

- pre/post filters
- many other t

Sorting
~~~~~~~
RECORDS_REST_SORT_OPTIONS
RECORDS_REST_DEFAULT_SORT
Simple, multi-field

/records/?sort=-mostrecent

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-sort.html
- order: asc/desc
- mode: min, max, avg, sum
- nested
- missing values
- geo distance
- script based sorting

Query parser
~~~~~~~~~~~~

/records/?q=....

RECORDS_REST_ENDPOINTS['search_factory']
By default the easy query string query parser is used. Features:
- field names + operators
- exists/missing
- ranges
- wildcards, regular expressions
- fuziness
- proximity search
- boosting

Legacy Invenio query parser

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax

- Replacing the query parser.

Suggesters
~~~~~~~~~~

/records/_suggest?text=...

- Config
- Make sure data is properly indexed.
suggesters[my_url_param_to_complete]

Advanced customization
~~~~~~~~~~~~~~~~~~~~~~
Replace the search factory to do very advanced querying - i.e. exact control over
what is sent to elasticsearch.

Max results

Error handlers

Record class Fetch record from ES instead of database.

Serialization
-------------
Key feature is transforming records from JSON to other formats. E.g. JSON-JSON,
removing sensitive information, enriching, stable output format, or standardized formats like
DataCite, DublinCore,

Content negotiation
~~~~~~~~~~~~~~~~~~~
- Mimetype -> Serializer
- Versioning via mimetypes

Workflow
~~~~~~~~
Preprocess into same format (database vs ES)
Transform record
Serialize data format

Data formats
~~~~~~~~~~~~~~~~~~~~
- JSON: JSON-LD, CSL
- XML: DataCite, DublinCore, MARCXML
- Text: Citation formatting

Transformations
~~~~~~~~~~~~~~~
- JSON-JSON: Marshmallow

Citation formatting
~~~~~~~~~~~~~~~~~~~
Example:
transform to

Tombstones and redirection
--------------------------
error handlers

Access control
--------------
Principles
- search filtering
- factories

impact of es query parser and hidden fields

Factories
~~~~~~~~~

Deserialization
---------------
Create, Update, Delete support
- Loaders
- Minters/Fetchers