bwalsh commented 7 years ago

This PR contains:

an implementation of swift object store, implemented using the existing keystone authentication service
example user/project setups
an implementation of the a swift pipeline observer to harvest file additions and notify downstream systems (dcc & bmeg )

A code review would be very helpful.

@kellrott @ksonmez @mayfielg @k1643 @adamstruck @buchanae @prismofeverything: can you review and comment

@AAMargolin @jacmarjorie :FYI

swift

Simple deployment of a "all in one" style OpenStack Swift server, uses Ubuntu packages as opposed to source.

The Swift implementation is tied to the Keystone sibling project for authentication and authorization. All data is stored in a docker volume container.

capabilities

As files are added to the object store by an authenticated user, they are published to a message queue where they can be consumed by different systems [dms, bmeg, other...]
When an authenticated user queries using the dcc portal or a client tool, they results they see are based on the authorizations assigned to their identity as provided by the institutional ldap server

data flows

(0) - the projects' meta data and files are loaded into dcc (aka 'release')
(1,2) - An authenticated user adds a file to the object store, and the euler plugin picks up meta data about the container, object(file) and user account and posts it to euler's api service for processing.
(3) - The euler api server consumes the event and composes a Resource message and publishes it to a kafka topic. A kafka component picks up the message and uses existing calls to the dcc-server(which depends on es,mongo) to retrieve context data and creates a file centric document . //todo
(4) - An authenticated user, using the either the portal or a command line tool, queries for resources or metadata. The euler api component filters and redacts those calls. //todo
(5) - The euler api component relies on keystone for identity and authorization decisions.

usage

Startup - nothing special, included in the docker-compose setup.
Defining users, projects, etc. This is is covered in the main readme in the euler/keystone repo.

swift server config setup

On keystone server, create a user, service roles and endpoints for swift see here

os user create --domain default --password-prompt swift
os role add --project service --user swift admin
os  project create --description "Service Project" service
os role add --project service --user swift admin
os role add --project service --user admin  admin
os service create --name swift   --description "OpenStack Object Storage" object-store
os endpoint create --region oshu object-store public http://swift:8080/v1/AUTH_%\(tenant_id\)s
os endpoint create --region ohsu object-store internal http://swift:8080/v1/AUTH_%\(tenant_id\)s
os endpoint create --region ohsu object-store admin http://swift:8080/v1

verify swift operation

On swift server, defining containers, uploading files,etc.

export OS_AUTH_URL="http://controller:35357/v3"
export OS_IDENTITY_API_VERSION="3"
export OS_PASSWORD=ADMIN_PASS
export OS_USERNAME="swift"
export OS_USER_DOMAIN_ID="default"
export OS_PROJECT_DOMAIN_ID="default"
export OS_PROJECT_NAME="service"
alias os=openstack

# # verify it came up
# swift stat
Account: AUTH_1e486beffa674390b406bb868fe8397f
Containers: 0
Objects: 0
  Bytes: 0
X-Put-Timestamp: 1481332328.94644
X-Timestamp: 1481332328.94644
X-Trans-Id: txcc9926ae4d9f457588038-00584b5667
Content-Type: text/plain; charset=utf-8
root@0724c2d89fe6:/# openstack container  list

# # create container and upload file
# os container create container1
# ls -l > /tmp/FILE1
# swift upload  container1  /tmp/FILE1  -H "content-type:text/plain" -H "X-Object-Meta-color:blue"
+------------+------------+----------------------------------+
| object     | container  | etag                             |
+------------+------------+----------------------------------+
| /tmp/FILE1 | container1 | 0c9018bbd7936cdcf0bf0726c1127261 |
+------------+------------+----------------------------------+

# os object  show  container1  tmp/FILE1
+----------------+-----------------------------------------+
| Field          | Value                                   |
+----------------+-----------------------------------------+
| account        | AUTH_1e486beffa674390b406bb868fe8397f   |
| container      | container1                              |
| content-length | 1071                                    |
| content-type   | text/plain                              |
| etag           | 9863d4db04bb6ba178dd84fdc9c54680        |
| last-modified  | Sat, 10 Dec 2016 05:29:06 GMT           |
| object         | tmp/FILE1                               |
| properties     | Color='blue', Mtime='1481347551.599294' |
+----------------+-----------------------------------------+

# # assign capabilities to other groups
# swift post container1 --read-acl "0369f74274b1499eb9257994b8b67087:*" --write-acl "0369f74274b1499eb9257994b8b67087:*"
# swift post container1 --read-acl "d53c2eea749a44a0931ee77fd4c5dcce:*" --write-acl "d53c2eea749a44a0931ee77fd4c5dcce:

# # experiment with other identities ...
# unset OS_USER_DOMAIN_ID
# export OS_USER_DOMAIN_NAME=testing
# export OS_PASSWORD=password

# export OS_PROJECT_NAME=baml
# export OS_USERNAME=baml_user
# # experiment ... os container create baml_container

# export OS_USERNAME=brca_user
# export OS_PROJECT_NAME=brca
# # experiment ... os container create brca_container

# export OS_USERNAME=ccc_user
# export OS_PROJECT_NAME=ccc

extension

This project includes an extension Swift, the OpenStack Object Storage project, so it performs extra action on files at upload time. We're going to build an DMS observer inside Swift. The goal is to observe the new file, harvest the attributes of the resource, the project and account and forward it via kafka to one or more downstream consumers.

To do our observation, we use Swift's pipeline architecture. Swift proxy uses, like many other OpenStack projects, paste to build his HTTP architecture.

Paste uses WSGI and provides an architecture based on a pipeline. The pipeline is composed of a succession of middleware, ending with one application. Each middleware has the chance to look at the request or at the response, can modify it, and then pass it to the following middleware. The latest component of the pipeline is the real application, and in this case, the Swift proxy server.

This container implements Swift, where we've added our observer to the default pipeline in the swift-proxy.conf configuration file:

See euler we've added to the pipeline in the swift-proxy.conf configuration file:

[pipeline:main]
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk ratelimit euler authtoken keystoneauth copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server

Euler has a simple configuration that basically just tells swift where to load it. The Dockerfile copies the euler.py to the correct installation path.

[filter:euler]
paste.filter_factory = swift.common.middleware.euler:filter_factory
set log_level = DEBUG
set log_headers = True

You can read more about WSGI here, but this is all that should be necessary to harvest all the information from the object store.


    def __call__(self, env, start_response):
        """
        WSGI entry point.
        Wraps env in swob.Request object and passes it down.

        :param env: WSGI environment dictionary
        :param start_response: WSGI callable
        """

        if not env['REQUEST_METHOD'] in ('PUT', 'COPY', 'POST'):
            return self.app(env, start_response)

        response = None
        try:
            # complete the pipeline
            response = self.app(env, start_response)
            # we want to query the api after the file is stored
            # harvest container, account and object info
            container_info = get_container_info(
                env, self.app, swift_source='Euler')
            self.logger.debug("env: {}".format(env))
            self.logger.debug("container_info: {}".format(container_info))

            account_info = get_account_info(
                env, self.app, swift_source='Euler')
            self.logger.debug("account_info: {}".format(account_info))

            object_info = get_object_info(
                env, self.app)
            self.logger.debug("object_info: {}".format(object_info))
        except:  # catch *all* exceptions
            tb = traceback.format_exc()
            self.logger.debug("traceback: {}".format(tb))

        finally:
            # unaltered upstream response
            return response

reading

https://github.com/openstack/kolla https://github.com/ccollicutt/docker-swift-onlyone http://docs.openstack.org/mitaka/install-guide-ubuntu/swift-controller-install.html https://gist.github.com/briancline/8119051 http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html https://ask.openstack.org/en/question/97991/newton-not-a-valid-cloud-archive-name/

grmayfie commented 7 years ago

It looks like the docker-compose file requires docker engine >=1.12 and docker-compose >=1.9, in order to use the default environment variables specified. This should definitely be at least noted in the main README.md since it forces a docker upgrade. (I had to delete my current docker and completely reinstall the newest version to run it, and then separately upgrade docker-compose as well.)

I propose adding to README: Euler uses docker-compose file version 2.1, which requires docker engine 1.12 or greater and docker-compose 1.9 or greater. These may need separate updates to ensure both requirements are met. See links for upgrading information.

bwalsh commented 7 years ago

@mayfielg thank you, i've updated the README

grmayfie commented 7 years ago

I'm not following the Euler diagram in the README. Where is the API fitting into the other dcc-dms diagrams we've proposed/been discussing? I feel like it's trying to be the 'upload server.' Am I correct?

ghost commented 7 years ago

Regarding the diagrams, would you remove the marketing logos and replace them with some plain boxes representing components with component labels? It's not clear to me what the software components are. Kafka, DCC, BMEG, are each made of several processes, so it's hard to know what's being proposed when it comes to configuration and deployment when those multiple components are represented as logos.

ghost commented 7 years ago

ITG setup swift in the exastack CompBioSwift project. Should we rely on an ITG administered swift? Is using containerized swift for proof-of-concept purposes, or do you intend that to be for production?

ghost commented 7 years ago

OpenStack login is based on keystone. Could we use the existing keystone, or is there a reason for running our own keystone? Is running keystone in a docker-compose network a proof-of-concept, or for production? How services in different networks all access keystone? Could all our services access an LDAP proxy instead? What does keystone give us that we can't get from an LDAP proxy?

bwalsh commented 7 years ago

@k1643

Is running keystone in a docker-compose network a proof-of-concept, or for production?

proof-of-concept so we can control & experiment with project setup

What does keystone give us that we can't get from an LDAP proxy?

keystone links identity (ldap) -> authorization (project membership) -> to object store permissions
keystone implements a project/role/group datastore system of record
keystone manages multiple domains a mixture of ldap &/or local
keystone integrates with shibboleth, SAML, etc
keystone support OAuth1
keystone supports federation where multiple institutions can manage their own keystone instance

We would need to develop and maintain these features if we didn't use keystone. Using keystone lets us retire the search_authorization project

These and others should hopefully be apparent from reading docs/use_cases.md

bwalsh commented 7 years ago

@k1643

Should we rely on an ITG administered swift? Is using containerized swift for proof-of-concept purposes, or do you intend that to be for production?

proof of concept so we can experiment with swift listener.

bwalsh commented 7 years ago

@mayfielg

I feel like it's trying to be the 'upload server.' Am I correct?

The api provides two services to the portal user:

login via JWT tokens backed by swift
proxy to filter requests from portal-ui to portal-server

The api provides one service for the add-a-file use case:

/v0/files - the intent here is to compose the Resource message for kafka

All uploads are handled by swift, we've inserted a listener that communicates back to /v0/files

I've updated readme

bwalsh commented 7 years ago

@mayfielg @k1643 - I've updated readme and replied above. Please let me know if I've missed anything.

ohsu-comp-bio / euler

Swift #3

swift

capabilities

data flows

usage

swift server config setup

verify swift operation

extension

reading