ngageoint / elasticgeo

ElasticGeo provides a GeoTools data store that allows geospatial features from an Elasticsearch index to be published via OGC services using GeoServer.
GNU General Public License v3.0
169 stars 85 forks source link

Updates to Use X-Pack Security and Support Custom Field Names and Ordering #88

Closed johndeereguy closed 5 years ago

johndeereguy commented 5 years ago

These changes will allow users to create ElasticSearch datastores that can proxy user queries and take advantage of the X-Pack security features of ElasticSearch. There are also updates to allow for custom field names and ordering of the fields.

sjudeng commented 5 years ago

Thanks for the last round of updates. It's looking good. There are a number of comments that for some reason are "hidden" by GitHub. Can you check if you can unhide and address those?

hidden_comments
johndeereguy commented 5 years ago

Thanks for pointing out the hidden comments, I didn't notice them. I'll address those and the tests in the next couple days.

johndeereguy commented 5 years ago

Thanks again for your help getting the updates for this pull request cleaned up. The Unit test UsernamePasswordElasticDatastoreFactoryTest was created. We don't have XPack in this environment, so we won't be able to do a integration test for the UsernamePasswordElasticDatastoreFactory. Much of the code in this class is inherited and being tested in other areas of this repository. If you would like any additional unit tests to be included in UsernamePasswordElasticDatastoreFactoryTest, can you let me know?

coderloner commented 5 years ago

My company is interested in gaining the X-Pack Security functionality, too. Any idea how long it could be before this PR is approved and merged?

sjudeng commented 5 years ago

@coderloner It's close to being merged. It might be longer before another release is generated though so you'll need to build it locally if you'd like it sooner. If you're able to try it out please report back here if you have any issues or also to confirm it works for you. Independent testing is always appreciated.

sjudeng commented 5 years ago

@johndeereguy It doesn't look like UsernamePasswordElasticDatastoreFactory is exposed through GeoServer, is that right? How do you use it? Also for testing you can mock the client so you don't need the service running. I took a stab at this here: https://github.com/sjudeng/elasticgeo/commit/e87007700528048864a20184c14ade78cea767b3.

johndeereguy commented 5 years ago

In the version we are running on our system, I've updated elasticgeo/gt-elasticsearch/src/main/resources/META-INF/services/org.geotools.data.DataStoreFactorySpi. It contains the single line: mil.nga.giat.data.elasticsearch.UsernamePasswordElasticDataStoreFactory I didn't check that in with the request, thought that would force everyone to use it, and the default should be left to mil.nga.giat.data.elasticsearch.ElasticDataStoreFactory? Let me know what you think. This is what exposes UsernamePasswordElasticDatastoreFactory to GeoServer on our system.

sjudeng commented 5 years ago

Would you be against me combining the two factories into one eventually? We can merge this as is (once we get the test coverage up as noted above). But then I might work to combine so the capability is exposed more broadly. If I read it right the logic in UsernamePasswordElasticDatastoreFactory could just be conditioned based on whether user provides proxy and/or admin credentials.

johndeereguy commented 5 years ago

I think that would be a great idea. Our security requirements mandate that we protect data using XPack, so we didn't put the effort in to allow creating datastores both with and without the username/password. If you could help use merge them, allowing that option, that would be helpful. Thanks for posting the updates to UsernamePasswordElasticDatastoreFactoryTest.java, I'll review that during the next couple of days, hopefully that will get us to the appropriate test coverage.

johndeereguy commented 5 years ago

Thanks for providing the updates to more thoroughly test UsernamePasswordElasticDatastoreFactory (along with the other code cleanup). I looked them over and included them in the latest commit to this pull request and the test coverage is much improved. Let me know if you see other issues we need to address.

coderloner commented 5 years ago

I have been trying to use this code with an active ElasticSearch installation that has SSL enabled. Since your changes added no username or password field to the Elasticsearch store panel in GeoServer, how do you expect the user to provide this? Thanks!

johndeereguy commented 5 years ago

Hi Coderloner. In the version we are running on our system, I've updated elasticgeo/gt-elasticsearch/src/main/resources/META-INF/services/org.geotools.data.DataStoreFactorySpi. It contains the single line: mil.nga.giat.data.elasticsearch.UsernamePasswordElasticDataStoreFactory I didn't check that in with the request, thought that would force everyone to use it, and the default should be left to mil.nga.giat.data.elasticsearch.ElasticDataStoreFactory. If you make this update you'll see the username and password fields in your Elasticsearch store panel in GeoSever. Sjudeng, commented that he will work with us to merge the two factories into one eventually, so this change won't be necessary.

coderloner commented 5 years ago

@johndeereguy I'm having trouble getting this working and creating the elasticsearch store in Geoserver. You have fields for Admin user and Proxy user. Could you define what user goes in each field--i.e., does Admin refer to the Elasticsearch login?

johndeereguy commented 5 years ago

@sjudeng Forgive me, I don't have a lot of experience with git and merging. I attempted to squash my last 14 commits with this command: git rebase -i HEAD~14

In the text editor, I updated the pick to squash as follows: pick 7067372 Updates to add usernames and passwords to ElasticDataStores squash e7Sb06b Adding spring framework dependency squash ccGf6lO Updates to add usernames and passwords to ElasticDataStores squash df ac094 Modifying proxy test cases. squash fc4ef96 Update to GeoServer 2.14.0/GeoTools 20.0. Fix JTS imports. squash 9f35b3a Support multiple hosts squash 39b181c Update version to 2.14-SNAPSHOT squash be2ddfc [maven-release-plugin) prepare release 2.14.0-RC1 squash dafbO43 [maven-release-plugin] prepare for next development iteration squash f6fec36 #90 Resolved the following exception Unrecognized field “doccount_error_upper_bound” when using aggregations squash ca9bcca Resolve merge conflicts after updates from ngageoint/master. squash e025519 Format updates, replacing tabs with 4 spaces. squash 8393f8e Completing format updates, replacing tabs with 4 spaces. squash 13840f5 Completing format updates, replacing tabs with 4 spaces. squash 2c554a6 Addressing code review connents, reuse clients when they are the same, and not close the client in dispose. squash cca3670 Finish addressing code review comrents, added a test for UsernamePasswordElasticDatastoreFactory. squash 26d82d8 Clean up formatting, add tests, removed RestClient from arguments that were not needed for tests, updated names per code review coments. squash f42da19 Add unit test to compare order and normalize custom name. Also, updated datastore factory to close RestClient when connection errors are encountered. This will reslove lOExcep tiori: Too many open files squash 7a74d7b Updates from sjudeng to improve testing, includes mock clients, and general code cleanup.

I then proceeded to get merge errors, I tried to skip, but they continued for what seemed like every commit: error: could not apply 9f35b3a... Support multiple hosts

When you have resolved this problem, run "git rebase --continue". If you prefer to skip this patch, run "git rebase --skip" instead. To check out the original branch and stop rebasing, run "git rebase --abort".

Could not apply 9f35b3a... Support multiple hosts Auto-merging gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/ElasticDataStoreFactory.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/ElasticDataStoreFactory.java

error: could not apply cca3670... Finish addressing code review comments, added a test for UsernamePasswordElasticDatastoreFactory.

When you have resolved this problem, run "git rebase --continue". If you prefer to skip this patch, run "git rebase --skip" instead. To check out the original branch and stop rebasing, run "git rebase --abort".

Could not apply cca3670... Finish addressing code review comments, added a test for UsernamePasswordElasticDatastoreFactory. Auto-merging gt-elasticsearch/src/test/java/mil/nga/giat/data/elasticsearch/RestElasticClientTest.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/test/java/mil/nga/giat/data/elasticsearch/RestElasticClientTest.java Auto-merging gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/UsernamePasswordElasticDataStoreFactory.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/UsernamePasswordElasticDataStoreFactory.java Auto-merging gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/RestElasticClient.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/RestElasticClient.java Auto-merging gs-web-elasticsearch/doc/index.rst

error: could not apply 2c554a6... Addressing code review comments, reuse clients when they are the same, and not close the client in dispose.

When you have resolved this problem, run "git rebase --continue". If you prefer to skip this patch, run "git rebase --skip" instead. To check out the original branch and stop rebasing, run "git rebase --abort".

Could not apply 2c554a6... Addressing code review comments, reuse clients when they are the same, and not close the client in dispose. Auto-merging gt-elasticsearch/src/test/java/mil/nga/giat/data/elasticsearch/RestElasticClientTest.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/test/java/mil/nga/giat/data/elasticsearch/RestElasticClientTest.java Auto-merging gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/UsernamePasswordElasticDataStoreFactory.java CONFLICT (content): Merge conflict in gt-elasticsearch/src/main/java/mil/nga/giat/data/elasticsearch/UsernamePasswordElasticDataStoreFactory.java Auto-merging gt-elasticsearch/pom.xml CONFLICT (content): Merge conflict in gt-elasticsearch/pom.xml

And the list goes on... Do you have any recommendations of a better way to perform the squash so I can avoid all the manual merging?

sjudeng commented 5 years ago

First make sure you've aborted any in-progress rebase and restore the branch to the original state. Also probably good to backup your branch if you didn't already.

git rebase --abort
git checkout securityUpdates
# change johndeereguy below to the remote name for your fork
git reset --hard johndeereguy/securityUpdates

Then one way to do it would be do a soft reset to master and then commit all updates.

# change origin below to the remote name for official
git fetch origin
git reset --soft origin/master
git commit

You can compare your remote and local branches at this point. There should be no difference.

git diff johndeereguy/securityUpdates..securityUpdates

Once you're satisfied force push.

git push johndeereguy securityUpdates --force
johndeereguy commented 5 years ago

@sjudeng Thanks for the detailed steps! They appeared to work as expected, and I think I've squashed the commits as you requested, let me know if you see any problems. Here's are the updated commands I ran based on your post: Here's my configuration:  

$ git remote -v
ngageoint       https://github.com/ngageoint/elasticgeo.git (fetch)
ngageoint       https://github.com/ngageoint/elasticgeo.git (push)
origin  https://github.com/johndeereguy/elasticgeo.git (fetch)
origin  https://github.com/johndeereguy/elasticgeo.git (push)  

Here are the commands I ran, based on your recommendation, updating with the above values:
$ git rebase --abort
$ git checkout securityUpdates
$ git reset --hard origin/securityUpdates
$ git fetch ngageoint
$ git reset --soft ngageoint/master
$ git commit
$ git diff origin/securityUpdates..securityUpdates --returned no results as expected.
$ git push origin securityUpdates --force

sjudeng commented 5 years ago

Looks good. I pushed this temporarily to a feature branch (https://github.com/ngageoint/elasticgeo/tree/basicAuth) where I'll work on combining the factories. Can you answer @coderloner's question above regarding the admin/proxy user credentials? I'll need to update documentation to include these parameters once the factories have been combined.

sjudeng commented 5 years ago

What's the purpose of the proxyClient versus adminClient? Is it to enable use of credentials with non-admin access for general user requests and limiting use of admin credentials for specific required operations like getting mappings and aliases?

johndeereguy commented 5 years ago

Yes, we need to have all requests to access ElasticSearch data through our GeoServer run via the ProxyClient user. Our ElasticSearch administrator has configured a "proxyClient" user with minimum privileges, including "run-as" role. The "proxyClient" user can't see any ElasticSearch data on it's own. All user requests for data are submitted to ElasticSearch with the submitting users ID in the request header. From there the ElasticSearch X-Pack functionality takes over, and returns only records which the submitting user has x-pack roles to access. We found as you mention, that certain calls wouldn't work without "admin" access. In those cases the ElasticGeo plugin should submit the request using the Admin account which our ElasticSearch admin has configured with addition privileges, so mapping and other calls will work properly.

johndeereguy commented 5 years ago

@coderloner Yes, the Admin User is an ElasticSearch native user that has full administrative rights to get index mappings, aliases, etc. The Proxy User is also a native ElasticSearch user and only has the "run-as" ability in our configuration. We have configured filters in our GeoServer that does some basic user authentication when a request is received, and adds the requesting user to the security context. The updates in this pull request has the ElasticGeo plugin pull that authenticated user from the security context and then add it to the request header. From there the ElasticSearch x-pack plugin uses that header value and only returns the ElasticSearch results that the requesting user is permitted to see based on the roles they are assigned in ElasticSearch. Hope this helps!

sjudeng commented 5 years ago

It looks like the use case for the run-as mechanism is to allow clients to execute requests on behalf of externally-authenticated users. Specifically here it looks like it'll use the authenticated GeoServer user. Is that right?

I can see the use case for having proxy user credentials separate from the admin user. I can also see the case for using run-as to execute requests on behalf of the authenticated GeoServer user. But I think they could be done either/or, not necessarily both at the same time. In particular if you're going to execute the request on behalf of the GeoServer user, why not just use the admin instead of the proxy user credentials?

I think it would make sense to add another configuration option ("Run as GeoServer user") to hopefully allow all the cases. Any objections? Without the opt-in the run-as logic might be unexpected.

Thanks for your responses on these questions. I've merged the two factories and am just working through integration testing and documentation updates.

johndeereguy commented 5 years ago

@sjudeng Yes, the filter we have added to our Geoserver instance does the user authentication piece, uses a form of SAML token identification, which contains information about the user which is used to assign the appropriate x-pack roles to the external user. We add this user to the security context of the Spring framework after the authentication is complete. At this point the ElasticGeo plugin has access to the authenticated user via the security context.

Initially we only had an adminClient in our configuration. However, we encountered a bug in the ElasticSearch x-pack software. In a situation where the user session had expired, ElasticSearch would still perform the query the "run-as" string wasn't being used after session expiration and the results returned were based on the admin user roles. We told ElasticSearch of the bug, but it wasn't going to be fixed in a time frame we needed, so we added the second proxy user option. If this ElasticSearch bug is encountered the user will not see any records, since the proxy user can't see anything, rather than seeing records they should have access to see.

Would it be possible to enforce the run as method we added when you merge factories, depending on what the GeoServer implementation requires? For our configuration, we don't want to allow a GeoServer adminstrator to create an ElasticSearch datastore without the proxy user and admin user configuration. This could open up a security hole in our situation. That is one reason why we removed the ElasticdataStoreFactory from the DataStoreFactorySpi file in our situation.

sjudeng commented 5 years ago

Ah interesting. That makes sense regarding including both admin and proxy user auth to handle the bug on the Elasticsearch side. But I don't think we should require all users to specify both of these. I think many may just need/want to provide a single set of user credentials. It may be even more uncommon for users to tie their GeoServer and Elasticsearch authentication together. It's a nice option to provide though.

Does it make sense to rename proxy user to "default user" as part of combining the factories? Might make it more clear what it is. Again though I don't think GeoServer or the plugin can block users with Elasticsearch admin credentials from configuring the datastore to use those credentials without also providing proxy user.

sjudeng commented 5 years ago

Here are the options I'm proposing in the new (combined) factory.

screen shot 2019-02-04 at 9 33 08 pm
johndeereguy commented 5 years ago

@sjudeng In our case, it is mandatory to have the admin user and the proxy user. I would expect others users that would like to take advantage of the ES x-pack functionality could also be held to the same security standards. That being said could you update combined factory so by default it would include the es-security-runas-user header with the non-admin requests and by default submit them under the default_user? Possibly update the checkbox to "disable_runas_user", and when checked all requests would be submitted under the admin_user? This would require the datastore creator to thoughtfully bypass the runas flow. If you feel this isn't a clean solution for the community, I understand. We could possibly extend the combined factory and make the tweaks on our side.

If you think the default_user name is better for community understanding, then I'm OK with that. I just always associated that user as the proxy user, thus the the name.

sjudeng commented 5 years ago

What if an environment variable (ES_XPACK_FORCE_RUNAS or similar) could be supported to control this? The default if unset would be the more accomodating optional user (admin and proxy) credentials with runas-geoserver-user disabled. But if the deployment environment does define the variable then both sets of credentials would be required and runas-geoserver-user would be force-enabled.

johndeereguy commented 5 years ago

@sjudeng I think setting the optional environment variable to force the use of both sets of credentials, and include the es-security-runas-user in the header, is a great idea. Could an adminstrator set the variable as a context-param in the GeoServer web.xml? Something like this:

ES_XPACK_FORCE_RUNAS true

Or would you recommend another way?

johndeereguy commented 5 years ago

Sorry, I guess I needed to insert my context-param example as a code snippet for it to appear correctly: `

ES_XPACK_FORCE_RUNAS
<param-value>true</param-value>

`

sjudeng commented 5 years ago

I had just been thinking that the user would provide it as an environment variable, for example for Tomcat export JAVA_OPTS="$JAVA_OPTS -DES_XPACK_FORCE_RUNAS". But I'll see if I can find examples of how this is done with some of GeoServer's other datastore plugins for consistency.

albertwgchu commented 5 years ago

I appreciate all the discussion on this feature and I wanted to add my scenario.

We have a web app where users which are entitled to different regional data sets. So for example user1 may be able to see map data in Canada and USA, where as user2 can only see map data in Canada.

I think ideally for us, we would setup user1 and user2 with different document level security in XPack. When users make wms/wfs requests those entitlements would be honoured.

Currently we are working around this, by proxying requests and passing dynamic CQL filters based on the user logged in our web application (which is independent of geoserver and XPack).

sjudeng commented 5 years ago

@albertwgchu So it sounds like the capability here would work for your case? You'd set up GeoServer to authenticate your users and then configure the plugin to execute your Elasticsearch queries for WMS/WFS requests on behalf of that authenticated user, right? Or do you see any gaps in the proposed capability/configuration?

albertwgchu commented 5 years ago

I was originally thinking of setting up users in XPack.

In the screenshots above it looks like the data source is getting extra parameters for the XPack user. I'm not clear on how multiple XPack users would be used for the same data source.

sjudeng commented 5 years ago

@johndeereguy The updates have been merged into master. Thanks for all your work and time on this. The combined factory is configurable as described above. One note, in combining the factories I didn't carry over the static connection manager that was in your factory. I don't think that's typically done at the datastore plugin level. But if there's an issue with this we can open another issue and work that separately.

At your convenience could you build the plugin off of master and try it out to make sure everything works in your environment? @coderloner, @albertwgchu (or anyone else in the community) If you're also able to try it out and report any issues (or successes), your feedback would be welcome.

Documentation: https://github.com/ngageoint/elasticgeo/blob/master/gs-web-elasticsearch/doc/index.rst#configuring-authentication

johndeereguy commented 5 years ago

@sjudeng Thanks for performing the merge into master. I'll build from the current master in the next couple of days and perform testing in our environment within the next week or so. I'll let you know how that goes, thanks again for all your help!

sjudeng commented 5 years ago

Great please do. I'd prefer to wait on a new release until getting some external confirmation that the feature is working for others since there's been so much refactoring.

sjudeng commented 5 years ago

The new release candidate 2.14.2-RC1 includes these updates and is available for review/testing.

albertwgchu commented 5 years ago

Thanks for cc'ing me on this!

On Fri, Mar 1, 2019, 6:31 AM sjudeng notifications@github.com wrote:

The new release candidate 2.14.2-RC1 https://github.com/ngageoint/elasticgeo/releases/tag/2.14.2-RC1 includes these updates and is available for review/testing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ngageoint/elasticgeo/pull/88#issuecomment-468665276, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaQeUF4-linXw29ds83V1mSsa0YOZyoks5vSSuTgaJpZM4YwnQl .

johndeereguy commented 5 years ago

@sjudeng Sorry it has taken me so long to get back to you. We are still running GS 2.12.1 and have custom code built with those dependencies that performs the user authentication, x-pack role creation, etc. There were some dependency issues I had to work through to get this version (along with GS 2.14.2) to work on our system. Today I was able to deploy elasticgeo-2.14.2.jar, which was built from 2.14.2-RC1, in our test environment. We ran a regression test and verified the admin and proxy user additions worked as expected. It will be another 6 to 8 weeks before we are scheduled to release this configuration to our larger test audience, but the first drop looks good. Thanks again for working with me to get these updates merged.

sjudeng commented 5 years ago

Great glad to hear it. If you find any issues with the feature in the future please open a new issue to report it. Thanks again for your work on this.

sjudeng commented 4 years ago

@johndeereguy Thank you for your contributions to ElasticGeo. I would like to donate the project as a community module to GeoServer and GeoTools. The OSGeo foundation that manages GeoServer requires all contributors to sign a contributor license agreement (CLA). If possible would you be willing to review the details on the page below and see if you're able to complete and send them the CLA?

(individual CLA) https://www.osgeo.org/resources/individual-contributor-license/ (or the corporate CLA) https://www.osgeo.org/resources/corporate-contributor-license/

Thanks again in any case.