ome / design

OME Design proposals
http://ome.github.io/design/
1 stars 15 forks source link

Horizontal Scaling: Read-Only #71

Open chris-allan opened 7 years ago

chris-allan commented 7 years ago

History, state of play, and rationale

Read-only versions of OMERO are applicable to many use cases. A few are outlined below:

  1. Adding more READ capacity and load balancing an OMERO server cluster; either on the same machine or on separate machines.

  2. Allow for operations upon a read-only PostgreSQL database snapshot or replica. Examples of this include, but are not limited to: full-text indexing, expensive query isolation, and immutability.

Overview

Phase I

The first major step is to enable a read-only version of the OMERO server. This requires read-only access to the filesystem and database. A crucial component of this work is to divorce the session store from the OMERO instance. Initially pure, in memory sessions will be utilized when the server is started in read-only mode to avoid the current read-write requirement on the session store. The default session storage is currently the database. As a consequence, sessions will not be portable between servers when started in read-only mode.

The result of this work would be an ability to specify an alternate hostname:port combination and get a connection to an OMERO server instance that would only allow read-only operations. Attempted write operations should throw an exception so that mistaken attempts to use write operations can be handled. Additionally, clients would be able to ask the server on login or a service on retrieval whether or not a read-only flag had been set to avoid such exceptions.

This would of facilitate load balancing and utilization of the read-only instance "by configuration".

Phase II

Building on phase I, this phase would be the pursuit of cluster-wide session storage, allowing for the portability of sessions between instances regardless of the running mode, read-only or read-write, of the server.

Current work

Creating a read-only database user for testing

Assuming that you have created OMERO databases and set owner of that database to "omero" protected by password authentication you can create a new user "omeroro" with a password of your choice:

$ createuser -P 'omeroro'
Enter password for new role:
Enter it again:

If you then connect to the OMERO database with this user, as allowed by pg_hba.conf, you will be able to list the tables (interrogate the schema) but not perform queries against the database:

$ psql -h localhost -U omeroro omero
psql (9.3.2)
Type "help" for help.

omero=> \dt
                             List of relations
 Schema |                       Name                       | Type  | Owner
--------+--------------------------------------------------+-------+-------
 public | _fs_deletelog                                    | table | omero
 public | _lock_ids                                        | table | omero
 public | acquisitionmode                                  | table | omero
 public | annotation                                       | table | omero
 public | annotationannotationlink                         | table | omero
 public | arc                                              | table | omero
 public | arctype                                          | table | omero
 public | binning                                          | table | omero
…
omero=> SELECT * FROM dbpatch;
ERROR:  permission denied for relation dbpatch

You can then, as a PostgreSQL superuser, GRANT the "omeroro" user the ability to run SELECT statements on the database:

$ psql omero
psql (9.3.2)
Type "help" for help.

omero=# \dn+
                       List of schemas
  Name  | Owner  | Access privileges |      Description
--------+--------+-------------------+------------------------
 public | callan | callan=UC/callan +| standard public schema
        |        | =UC/callan        |
(1 row)

omero=# GRANT SELECT ON ALL TABLES IN SCHEMA public TO omeroro;
GRANT
omero=# GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO omeroro;
GRANT

NOTE: This assumes no other tables are in the schema "public".

You are then able to execute SELECT but not UPDATE or INSERT statements against the database:

$ psql -h localhost -U omeroro omero
psql (9.3.2)
Type "help" for help.

omero=> SELECT * FROM dbpatch ;
 id | currentpatch | currentversion | permissions |          finished          |     message     | previouspatch | previousversion | external_id
----+--------------+----------------+-------------+----------------------------+-----------------+---------------+-----------------+-------------
  1 |            0 | OMERO5.0       |         -52 | 2015-04-03 08:58:18.790712 | Database ready. |             0 | OMERO5.0        |
(1 row)

omero=> BEGIN;
BEGIN
omero=> UPDATE dbpatch SET currentversion = 'foo' WHERE id = 1 ;
ERROR:  permission denied for relation dbpatch
omero=> INSERT INTO dbpatch VALUES (1);
ERROR:  permission denied for relation dbpatch
omero=> ROLLBACK;
ROLLBACK

This should also prevent any adverse execution of PostgreSQL functions.

Enabling in-memory Node, Session, Event support (read-only)

Coupled with the aforementioned database user privileges and a build of OMERO 5.2.x with the development branch included a "read-only" server can be achieved with the following OMERO configuration:

bin/omero config set omero.cluster.node_provider ome.security.basic.BasicInMemoryNodeProvider
bin/omero config set omero.security.event_provider ome.security.basic.BasicInMemoryEventProvider
bin/omero config set omero.sessions.session_manager ome.services.sessions.InMemorySessionManagerImpl

OMERO 5.2.x

After several discussions with the greater OME consortium it was decided that development work take place atop the current stable dev_5_2 branch of OMERO. This will allow for easy integration and testing against the current stable release of OMERO by third parties and especially by the IDR subteam. It will also allow easy integration into the currently available version of OMERO Plus. All of these constituents are utilising 5.2.x as a basis for their work.

Development branch:

Implemented features:

Ongoing work:

OMERO 5.3.x

Decisions on targeting OMERO 5.3.x will be made at a later date.

Related reading

History

Fri 31 Mar 2017 17:13:35 BST: Updated documentation for read-only mode Thu Feb 16 08:33:04 PST 2017: First version of a running server with IQuery possibility at a basic level Wed Jan 25 06:02:10 PST 2017: Initial version Thu Jan 26 05:01:30 PST 2017: Links to service routing and 5.2.x development plan

chris-allan commented 7 years ago

First push of glencoesoftware/openmicroscopy@read-only-phase1 which contains the first set of functional implementations.

callan@ubuntu:~/code/ome.git$ dist/bin/omero config get
omero.cluster.node_provider=ome.security.basic.BasicInMemoryNodeProvider
omero.db.name=omero52
omero.db.pass=omeroro
omero.db.user=omeroro
omero.security.event_provider=ome.security.basic.BasicInMemoryEventProvider
omero.sessions.session_manager=ome.services.sessions.InMemorySessionManagerImpl
callan@ubuntu:~/code/ome.git$ dist/bin/omero shell --login
Previous session expired for root on localhost:4064
Server: [localhost:4064]
Username: [root]
Password:
Created session 19076866-5ba3-43b8-8bfb-e6a5f260e3a4 (root@localhost:4064). Idle timeout: 10 min. Current group: system
Python 2.7.9 (default, Apr  2 2015, 15:33:21) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: s = client.getSession()

In [2]: s.getQueryService().get('Experimenter', 0L).getOmeName().getValue()
Out[2]: 'root'

In [3]: 

/cc @dpwrussell, @joshmoore, @dsudar

dpwrussell commented 7 years ago

@chris-allan Great. I guess there isn't really anything for us to test in our domain at this point?

chris-allan commented 7 years ago

@dpwrussell: Not at the moment, no. Aside from the obvious brain dead simple IQuery usage, what do you think would be a useful set of read-only operations to test with from your perspective?

dpwrussell commented 7 years ago

@chris-allan Initially being able to do basic listings (e.g. images in dataset), then get the metadata for those objects, and then being able to get pixels for those objects. Basically exactly what you would expect if someone was running an analysis job.

chris-allan commented 7 years ago

Pull request from @joshmoore attempting to get the glencoesoftware/read-only-phase1 branch building against the IDR metadata52 integration branch in openmicroscopy/openmicroscopy#5213.

/cc @dpwrussell