Rethink and/or document how Connections are shared between Server and API Root

oasis-open / cti-taxii-client

OASIS TC Open Repository: TAXII 2 Client Library Written in Python

https://taxii2client.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

111 stars 53 forks source link

Rethink and/or document how Connections are shared between Server and API Root #37

Closed gtback closed 4 years ago

gtback commented 6 years ago

This has caused a lot of confusion and problems. See my comment for more information.

emmanvg commented 6 years ago

@gtback, the approach you stated in https://github.com/oasis-open/cti-taxii-client/pull/36#issuecomment-391764789 makes sense to me.

mikekhusid commented 6 years ago

Kind of tangentially related but I realized that adding proxy support to the Server class in yesterday's pull request didn't carry over to the api_roots method.

In the short term, for those struggling with proxy issues such as myself, a solution is to add self._conn = conn at line 687 of init.py and conn = self._conn at line 733. If the proxy is specified correctly, this will work. Is this worth another pull request or is there a better long term solution?

chisholm commented 6 years ago

I am okay with removing verify from the endpoint constructors. But changing the behavior of Server depending on whether it was initialized with user/pass or a connection object seems rather... subtle. How about a special keyword arg to the Server constructor which controls connection reuse? This would directly address the issue, and decouple it from where the connection comes from. Maybe we should default it to true (reuse), since I guess that's how most people want it to work (but that wouldn't be backward-compatible, if that matters). E.g. you could pass username/password and have reuse, and... well I guess with the current design, it wouldn't be easy to not have reuse if a connection was passed (because it may not be straightforward to get the username/password out of the current connection object. But would be easy to add a couple properties).

chisholm commented 6 years ago

Hmm... on the other hand, if they used a custom connection object with the Server, there's currently no way to clone it (for settings), so that you can have independent connections to the ApiRoot's. I guess I was only thinking in terms of our _HTTPConnection class. Blearg...

I guess we could introduce a "cloneable" concept, e.g. having a "clone" method. If the connection is cloneable, call that method to clone it. Yuck. Or, if the connection is an _HTTPConnection, clone it (via a clone method we would write), otherwise don't (would be forced connection reuse). Ugh...

chisholm commented 6 years ago

Perhaps this calls for a "factory". That might be a dirty word to some, but it could be used to encapsulate connection reuse behavior. I've heard/seen at least three different reuse schemes so far:

Use the same connection for everything
Use a different connection per host (actually the suggestion was URL, but see below)
Use a different connection for different servers and api roots (regardless of their hosts), and collection/status uses their parent api root connections.

The third option is roughly how the taxii client works now (if ApiRoots are obtained from Server objects, and the latter are created via user/password instead of passing in a connection).

The second option might be roughly what is suggested/intended in PR #34 . That suggestion was essentially to reuse per URL (if I understood correctly), but that would need to be clarified. For example, every object has its own URL (of the form /<api-root>/collections/<id>/objects/<object-id>/, see TAXII 2.0 spec section 5.5), but surely we don't want a different connection for each one! I think we need coarser-grained reuse, e.g. per host.

And others are possible too. Am I over-thinking this? :)

gtback commented 6 years ago

HTTPConnection is just a wrapper around a requests.Session plus a few helper methods (valid_content_type could be made into a static method). A Session seems to handle multiple domains just fine, so I don't think that's a concern. It's mostly about different parameters (namely authentication credentials) that are used for different domains. Many of the more recent enhancements (proxies, verify) are likely to apply to all domains, and would not need to be changed either. The original implementation assumed that a lot of Server Discovery results would include API roots on other domains, and I don't think that is very common in practice. I would support automatically passing the same connection from Server to API Root and API Root to Collection, as long as there is some way to "override" or "customize" the connection afterwards.

One wrinkle is that an APIRoot object is created automatically from accessing the Server Discovery endpoint, but only its url property is set until other properties are accessed. This lazy-loading makes it strange to update the connection after the object has already been created, but it's really the only way given the current design.

I'll say (again, I think) that it should ideally be rare for applications to frequently traverse Server->API Root and API Root->Collection relationships. Once Collections have been discovered, they can be instantiated explicitly, and none of this connection-sharing really matters at all.

chisholm commented 6 years ago

Regarding handling multiple domains: yes, Session handles them, but you can only close all at once. You can't close the connection to one server and leave the rest open. One important aspect of the reasoning behind creating new "connections" was their independence. We picked what we thought was a reasonable design, but apparently people want to do it other ways. Requiring different credentials is another reason to use multiple connections. Credentials may typically vary on a per-host basis, but not necessarily.

Post-processing a connection after the endpoint class is created sounds like it would require either (a) subclassing the endpoint class, or (b) letting users pass in post-processing callbacks. (a) sounds like too much work, and (b) feels awkward. Perhaps a factory achieves a better separation of concerns, leaving the endpoint classes responsible only for communicating with the services, not dealing with funny customization hooks.

It may be ideally rare for users to traverse server->apiroot->collection, but it seems like the premise of this issue is that people are doing it and stumbling over it. Seems like the ideal and reality are different :)

I haven't got a complete proof-of-concept implementation yet, but I'm thinking that each endpoint class constructor could be changed to accept a factory, and maybe user/password if we still want to support that. The endpoint class would just call factory.get_connection() to get a connection, and pass in the URL and perhaps the endpoint type (server/apiroot/collection/etc). Whatever we think the criteria should be for choosing one. In the endpoint classes, I think usage should be pretty simple. No commitments though until I've done more experimentation :)

emmanvg commented 4 years ago

Currently, the Server class will propagate its configuration into its API Root objects as the current design. Feel free to reopen if this needs to be revisited.