Open janiemi opened 2 years ago
[I originally wrote this comment on 2022-10-13 but by accident added it to spraakbanken/korp-frontend#278 instead of this issue. I noticed that only now and moved the comment here, even though it is already out of date with respect to the current Korp backend.]
An amendment to the issue description above:
The plugins currently used in the Korp of the Language Bank of Finland are available in the branch plugins/master
of our Korp backend fork. (This branch is a merge of individual plugin branches.)
The current plugins are the following:
charcoder
: Encode special characters in queries and decode them in results. (Our corpora have certain characters encoded specially because of CWB limitations.)contenthider
: Hide marked structures in KWIC results by replacing their attribute values with specified fixed strings.lemgramcompleter
: Implement endpoint /lemgram_complete
, to find lemgram completions for a prefix.logger
: Log Korp queries.protectedcorporadb
: Retrieve a list of protected corpora from a MySQL database.shibauth
: Support authorization with information obtained from Shibboleth authentication in /authenticate
.lemgramcompleter
is an endpoint plugin, whereas the others are callback plugins.
Many of the hook points in korp.py
are currently used only by logger
.
In addition, the sample plugins test1
, test2
and test3
illustrate some features of plugins.
The plugins currently have no other documentation than source code comments, but I intend to add short readme files.
Preface
This issue tries to describe the plugin facility I have implemented for the Korp backend used in the Language Bank of Finland and that I’d propose as the basis for a plugin facility to be included in main Korp backend code. All feedback on the proposal is welcome.
Disclaimer: I have virtually no prior experience in plugin architectures, and also my knowledge of Flask and other Web technologies used in the Korp backend is almost solely based on the Korp backend and on what I have learned when modifying it. The features of the plugin facility reflect the modifications we’ve made to the Korp for the Language Bank of Finland, so something might be implemented in a too specific way or be completely missing. Thus, I’d be glad in particular if you pointed out if something in the plugin facility should be done differently.
The code for the plugin facility is currently on top of Språkbanken’s Korp backend code at commit aad6381f of 2020-01-20, but I’ll port it on top of the current code in the near future.
The plugin facility has a readme file, which contains some more details on the plugin facility.
Also see my proposal for a plugin facility for the Korp frontend.
I’m sorry that this is rather long for a GitHub issue description.
Korp backend plugin facility (proposal)
Overview
The aim of the Korp backend plugin facility is to make it easier to tailor Korp for different sites without having to modify main Korp code. To make this possible, the main Korp code needs some support for plugins and callback hook points in appropriate places in the Python code. The plugin support code is in currently in the package
korppluginlib
, but it will be moved under thekorp
package, probably namedpluginlib
.The Korp backend supports two kinds of plugins:
korp.py
when handling a request, to filter data or to perform an action.Plugins are defined as Python modules or subpackages, by default within the package
korpplugins
(customizable via the configuration variablePACKAGES
).Both WSGI endpoint plugins and callback plugins can be defined in the same plugin module.
Configuration
Configuring Korp for plugins
Korp’s
config.py
contains the following plugin-related variables:PLUGINS
: A list of names of plugins (modules or subpackages) to be used, in the order they are to be loaded.INFO_SHOW_PLUGINS
: What information on loaded plugins the response of the/info
command should contain:None
,"names"
or"info"
.Configuring
korppluginlib
The configuration of
korppluginlib
is in the modulekorppluginlib.config
. Currently, the following configuration variables are recognized:PACKAGES
: A list of packages which may contain plugins.SEARCH_PATH
: A list of directories in which to search for plugins (the packages listed inPACKAGES
) in addition to default ones.HANDLE_NOT_FOUND
: What to do when a plugin is not found:"error"
,"warn"
or"ignore"
.LOAD_VERBOSITY
: Whatkorppluginlib
outputs when loading plugins:0
(nothing),1
(plugin names only),2
: (plugin names, configurations, view functions, callback methods)HANDLE_DUPLICATE_ROUTES
: What to do with duplicate endpoints for a routing rule added by plugins:"override"
,"override,warn"
,"ignore"
,"warn"
or"error"
.Alternatively, the configuration variables may be specified in the top-level module
config
withinPLUGINLIB_CONFIG
; for example:The values specified in the top-level
config
override those inkorppluginlib.config
.Configuring individual plugins
Values for the configuration variables of individual plugin modules or subpackages can be specified in three places:
PLUGINS
in Korp’s top-levelconfig
module can be a pair(
plugin_name,
config)
, where config is either a dictionary- or namespace-like object containing configuration variables.config
module can define the variablePLUGIN_CONFIG_
PLUGINNAME, whose value is either a dictionary- or namespace-like object with configuration variables.config
within the subpackage, consisting of configuration variables.The value for a configuration variable is taken from the first of the above in which it is set.
To get values from these sources, the plugin module needs to call
korppluginlib.get_plugin_config
with default values of configuration variables:The configured value of
CONFIG_VAR
can be then accessed aspluginconf.CONFIG_VAR
.Renaming plugin endpoint routes
Endpoint routes (routing rules) defined by a plugin can be renamed by setting an appropriate value to the configuration variable
RENAME_ROUTES
of the plugin in question. This may be needed if two plugins have endpoints with the same route, or if it is otherwise desired to change the routes specified by a plugin. The value ofRENAME_ROUTES
can be a format string, adict
or a function of one argument mapping the original route to a renamed route. For more information, please see the documentation.Plugin information
A plugin module or package may define
dict
PLUGIN_INFO
containing pieces of information on the plugin. Alternatively, a plugin package may contain a module namedinfo
and a non-package plugin module plugin may be accompanied by a module named plugin_info
containing variable definitions that are added toPLUGIN_INFO
with the lower-cased variable name as the key. For example:The information on loaded plugins is accessible in
korppluginlib.loaded_plugins
.Endpoint plugins
Implementing a new WSGI endpoint
To implement a new WSGI endpoint, you first create an instance of
korppluginlib.KorpEndpointPlugin
(a subclass offlask.Blueprint
) as follows:You can also specify a name for the plugin, overriding the default that is the calling module name
__name__
:You can also pass other arguments recognized by
flask.Blueprint
.The actual view function is a generator function decorated with the
route
method of the created instance; for example:The decorator takes as its arguments the route of the endpoint, and optionally, an iterable of the names of possible additional decorators as the keyword argument
extra_decorators
, and other options ofroute
.extra_decorators
lists the names in the order in which they would be specified as decorators (topmost first), that is, in the reverse order of application. The generator function takes a singledict
argument containing the parameters of the call and yields the result.A single plugin module can define multiple new endpoints.
Non-JSON endpoints
Even though Korp endpoints should in general return JSON data, it may be desirable to implement endpoints returning another type of data, for example, if the endpoint generates a file for downloading. That can be accomplished by adding
use_custom_headers
toextra_decorators
. An endpoint usinguse_custom_headers
should yield adict
with the following keys recognized:"content"
: the actual content;"mimetype"
(default:"text/html"
): possible MIME type; and"headers"
: possible other headers as a list of pairs (header, value).For example, the following endpoint returns an attachment for a plain-text file listing the arguments to the endpoint, named with the value of
filename
(args.txt
if not specified):Neither the endpoint argument
incremental=true
nor the decoratorprevent_timeout
has any practical effect on endpoints withuse_custom_headers
.Defining additional endpoint decorators
By default, the endpoint decorator functions whose names can be listed in
extra_decorators
include onlyprevent_timeout
anduse_custom_headers
, as the endpoints defined in this way are always decorated withmain_handler
as the topmost decorator. However, additional decorator functions can be defined by decorating them withkorppluginlib.KorpEndpointPlugin.endpoint_decorator
; for example:Callback plugins
Callbacks to be called at specific plugin hook points in
korp.py
are defined within subclasses ofkorppluginlib.KorpCallbackPlugin
as instance methods having the name of the hook point. The arguments and return values of a callback method are specific to each hook point.In the first argument
request
, each callback method gets the actual Flask request object (not a proxy for the request) containing information on the request. For example, the endpoint name is available asrequest.endpoint
.korp.py
contains two kinds of hook points:Filter hook points
For filter hook points, the value returned by a callback method is passed as the first non-
request
argument to the callback method defined by the next plugin, similar to function composition or method chaining. However, a callback for a filter hook point need not modify the value: if it returnsNone
either explicitly or implicitly, the value is ignored and the argument is passed as is to the callback method in the next plugin.At present, filter hook points and the signatures of their callback methods are the following:
filter_args(self, request, args)
: Modifies the argumentsdict
args
to any endpoint (view function) and returns the modified value.filter_result(self, request, result)
: Modifies the resultdict
result
returned by any endpoint (view function) and returns the modified value. Note that when the arguments (query parameters) of the endpoint containincremental=true
,filter_result
is called separately for each incremental part of the result.filter_cqp_input(self, request, cqp)
: Modifies the raw CQP input stringcqp
, typically consisting of multiple CQP commands, already encoded asbytes
, to be passed to the CQP executable, and returns the modified value.filter_cqp_output(self, request, (output, error))
: Modifies the raw output of the CQP executable, a pair consisting of the standard output and standard error encoded asbytes
, and returns the modified values as a pair.filter_sql(self, request, sql)
: Modifies the SQL statementsql
to be passed to the MySQL/MariaDB database server and returns the modified value.filter_protected_corpora(self, request, protected_corpora)
: Modifies (or replaces) the listprotected_corpora
of ids of protected corpora, the use of which requires authentication and authorization.filter_auth_postdata(self, request, postdata)
: Modifies (or replaces) the POST request parameters inpostdata
, to be passed to the authorization server (config.AUTH_SERVER
) in the endpoint/authenticate
.filter_auth_response(self, request, auth_response)
: Modifies the responseauth_response
returned by the authorization server in the endpoint/authenticate
.Event hook points
Callback methods for event hook points do not return a value.
At present, event hook points and the signatures of their callback methods are the following:
enter_handler(self, request, args, starttime)
: Called near the beginning of a view function for an endpoint.args
is adict
of arguments to the endpoint andstarttime
is the current time as seconds since the epoch.exit_handler(self, request, endtime, elapsed_time, result_len)
: Called just before exiting a view function for an endpoint (before yielding a response).endtime
is the current time as seconds since the epoch,elapsed_time
is the time spent in the view function as seconds, andresult_len
the length of the response content.error(self, request, error, exc)
: Called after an exception has occurred.error
is thedict
to be returned in JSON asERROR
andexc
contains exception information.Callback plugin example
An example of a callback plugin containing a callback method to be called at the hook point
filter_result
:Notes on implementing a callback plugin
Each plugin class is instantiated only once, so the possible state stored in
self
is shared by all invocations (requests). However, see the next subsection for an approach of keeping request-specific state across hook points.A single plugin class can define only one callback method for each hook point, but a module may contain multiple classes defining callback methods for the same hook point.
If multiple plugins define a callback method for a hook point, they are called in the order in which the plugin modules are listed in
config.PLUGINS
. If a plugin module contains multiple classes defining a callback method for a hook point, they are called in the order in which they are defined in the module.If the callback methods of a class should be applied only to certain kinds of requests, for example, to a certain endpoint, the class can override the class method
applies_to(cls, request)
to returnTrue
only for requests to which the plugin is applicable.Keeping request-specific state
Request-specific data can be passed from one callback method to another within the same callback plugin class by using a
dict
attribute (or similar) indexed by request objects (or their ids). In general, theenter_handler
callback method (called at the first hook point) should initialize a space for the data for a request, andexit_handler
(called at the last hook point) should delete it. For example:Defining new hook points
New hook points can be added to plugins (as well as to
korp.py
) by invoking callbacks with the name of the hook point by using the appropriate methods. For example, a logging plugin could implement a callback methodlog
that could be called from other plugins, both callback and endpoint plugins.Given the Flask request object (or the global request proxy)
request
, callbacks for the (event) hook pointhook_point
can be called as follows, with*args
and**kwargs
as the positional and keyword arguments and discarding the return value:or, equivalently, getting a caller object for a request and calling its instance method (typically when the same function or method contains several hook points):
If
request
is omitted orNone
, the request object referred to by the global request proxy is used.Callbacks for such additional hook points are defined in the same way as for those in
korp.py
. The signature corresponding to the above calls isAll callback methods need to have
request
as the first positional argument (afterself
).Three types of call methods are available in KorpCallbackPluginCaller:
raise_event_for_request
(and instance methodraise_event
): Call the callback methods and discard their possible return values (for event hook points).filter_value_for_request
(andfilter_value
): Call the callback methods and pass the return value as the first argument of the next callback method, and return the value returned by the last callback emthod (for filter hook points).get_values_for_request
(andget_values
): Call the callback methods, collect their return values to a list and finally return the list.Only the first two are currently used in
korp.py
.Accessing main application module globals in plugins
The values of selected global variables, constants and functions in the main application module
korp.py
are available to plugin modules askorppluginlib.app_globals.
name. In this way, for example, a plugin can access the Korp MySQL database and the Memcached cache and useassert_key
to assert the format of arguments.Limitations and deficiencies
The current implementation has at least the following limitations and deficiencies, which might be subjects for future development, if needed. Some more information on the issues is in the documentation).
filter_args
andfilter_result
.config.PLUGINS
. The plugins themselves cannot specify that they should be loaded before or after another plugin, or that one callback of a plugin should be called before those of other plugins (such asfilter_args
) and another after those of others (such asfilter_result
).PLUGIN_INFO
or aninfo
module requires manual updating whenever the plugin is changed.korp.py
viakorppluginlib.app_globals
is somewhat cumbersome. It could be simplified by moving the helper functions to a separate library module that could be imported by plugins.main_handler
andprevent_timeout
cannot decorate an instance method.Influences and alternatives
Many Python plugin frameworks or libraries exist, but they did not appear suitable for Korp plugins as such. In particular, we wished to have both callback plugins and endpoint plugins.
Influcences
Using a metaclass for registering callback plugins in
korppluginlib
was inspired by and partially adapted from Marty Alchin’s A Simple Plugin Framework.The terms used in conjunction with callback plugins were partially influenced by the terminology for WordPress plugins.
The Flask-Plugins Flask extension might have been a natural choice, as Korp is a Flask application, but it was not immediately obvious if it could have been used to implement new endpoints. Moreover, for callback (event) plugins, it would have had to be extended to support passing the result from one plugin callback as the input of another.
Using Flask Blueprints for endpoint plugins was hinted at by @MartinHammarstedt.
Other Python plugin frameworks and libraries