oasis-open / cti-taxii-client

OASIS TC Open Repository: TAXII 2 Client Library Written in Python
https://taxii2client.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
107 stars 51 forks source link

Exception: AttributeError: 'dict' object has no attribute 'json' #113

Closed andrewbeard closed 1 year ago

andrewbeard commented 1 year ago

I have an application that uses the taxii2client module that hits an occasional exception we don't entirely understand but seems to be coming from the module code. I've traced the exception to it's source which seems wrong, but I'm not sure why it is the way it is and am looking for The Right Way (tm) to fix it.

The relevant part of the exception to the library

  File OUR_CODE
    for bundle in as_pages(collection.get_objects, per_request=self.paginate, **collection_params):
  File ".../python/lib64/python3.5/site-packages/taxii2client/v20/__init__.py", line 47, in as_pages
    yield _to_json(resp)
  File ".../python/lib64/python3.5/site-packages/taxii2client/common.py", line 124, in _to_json
    return resp.json()
AttributeError: 'dict' object has no attribute 'json'

The exception is getting thrown because _to_json (common.py:114) is expecting a requests.Response object, but gets a dict object instead as an arg and blows up.

I see the error that's actually causing the exception, and that's caused by func() call prior to the yield in as_pages. func in our case is collection.get_objects (v20/init.py:409). It returns the value of self._conn.get, which is implemented as _HTTPConnection.get (common.py:272). At the bottom of that function is this:

        if "Range" in merged_headers and self.version == "2.0":
            return resp
        else:
            return _to_json(resp)

If it matched that condition returns self.session.get which should be of type requests.Response and life is good. If it doesn't it seems like we return a dict, which causes an exception when as_pages calls _to_json() on the dict. The 64k question is why are we not always returning the response, and what does that check mean? It's looks fairly intentional, but has a very nasty side effect in our case.

I can fix this via the jackass method and check the type of response in _to_json, but that's a hack. On a related note this seems to work for TAXII 2.1 because the as_pages method is different and does not call _to_json, but triggers when getting data from small collections that only have data on a single page (which I'm sure is related to the bit about the Range header).

chisholm commented 1 year ago

My guess is that in the TAXII 2.0 case, the paging code needs the response headers (the Content-Range header specifically), so it can do paging. If _HTTPConnection.get() had always returned a dict, the paging information would be lost.

_HTTPConnection.get() returns the response object if the request included a Range header, probably because it is anticipating the presence of Content-Range in the response, and the need to page. The 2.0 get_objects() call does include a Range header with the request, if per_request is greater than zero. I see your invocation sets that to self.paginate. Is that always greater than zero?

This is all irrelevant in TAXII 2.1, since paging is done in a completely different way (not via request/response headers).

andrewbeard commented 1 year ago

Yes, when we're paging we always set a per_request greater than zero. I have a feeling something is going on where we don't know how much content to expect so we set the call up to paginate, but the collection we're getting objects from is pretty small and the remote server does something unexpected with the headers, throwing the assumption about header content off.

That explains why something is needed, at least. It seems like a terrible practice to have a low-level function like get() change its behavior based on assumptions about what other parts of the code are trying to do, though. This exception where the types get confused is case in point.

If I just wanted to clean it up I'd say always have get() return the response and have the caller explicitly convert to json. That would result in a LOT of changes peppered all over the code, though, including over the unit tests since they explicitly call get in a couple places. I'm going to try submitting a PR that wraps the get() return value in a new class that implements Mapping and json(). That should be a much smaller set of changes, maintain compatibility with the existing calls, and fix the type confusion (and the exception).

andrewbeard commented 1 year ago

Addressed in PR #114