sphinx-contrib / confluencebuilder

Confluence Markup Builder Plugin for Sphinx
BSD 2-Clause "Simplified" License
316 stars 99 forks source link

Content ID missing in API endpoint #615

Closed konradbender closed 2 years ago

konradbender commented 2 years ago

I am building documentation using sphinx's apidoc. When I run python3 -m sphinx -b confluence source build/confluence -E -a, I get the following error:

File "/home/ubuntu/.local/lib/python3.8/site-packages/sphinxcontrib/confluencebuilder/rest.py", line 204, in put
    raise ConfluenceBadApiError(rsp.status_code, errdata)
sphinxcontrib.confluencebuilder.exceptions.ConfluenceBadApiError: 
Unsupported Confluence API call

An unsupported Confluence API call has been made. See the following
details for more information:

REQ: PUT
RSP: 429
URL: https://corporate-intra-net.net/confluence/rest/api
API: content
DATA: <not-or-invalid-json>

(I can't show the data json but I parsed it with the python json encoder and that did work)

Here is the problem: I looked on the Atlassian Server API reference and there is no PUT for the content endpoint, or at least none without the ID in the endpoint name.

So, I think that maybe the ID of the content is missing in the API endpoint? I can also share the data json, if that were helpful.

Thank you!

jdknight commented 2 years ago

I believe the error message may not be an entire representation of the requests being published. All of the PUT requests performed by this extension should have an associated identifier to them -- it is just that the "format error" implementation does not reflect the identifier (and other possible fields) in the "API" portion of the message (let alone anywhere else). You could try, if you're curious, to enable urllib3 logging in your conf.py to get a better representation of the requests being made (we really should make this a configuration option, since it gets referenced a lot recently):

import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
rlog = logging.getLogger('requests.packages.urllib3')
rlog.setLevel(logging.DEBUG)

What concerns me is the 429 error code. What I suspect is happening is the Confluence instance is rate limiting publish requests. This assumption is solely on the information I have just recently read on "Rate limiting guide for Jira and Confluence". At this time, this extension does not properly impose any Confluence-recommended rate limiting -- which it should. Unfortunately, I also do not think there is an existing workaround available to impose rate limiting on a publish request (aside from going into implementation and injecting dummy sleeps).

If I have overlooked another possible issue here, please let me know.

jdknight commented 2 years ago

If interested (and if your issue is related to a rate limiting issue), feel free to try the changes introduced in #617 which attempt to support rate limiting. I have only performed some limited testing, since I do not have access to a data server instance (which appears to be the only instance that provides rate limiting options from an adminstrator's panel).

This can be installed using the following command:

pip install git+https://github.com/sphinx-contrib/confluencebuilder.git@support-rate-limiting
konradbender commented 2 years ago

@jdknight Thank you so much for opening a PR trying to fix it. I have tested the branch and can tell you what is happening on my side when I am using it with a Confluence Server instance:

$ python3 -m sphinx -b confluence source build/confluence -E -a
Running Sphinx v4.4.0
(request to accept username from interactive session)
 Instance: https://atc.corporate.net/confluence/
 User: q524675
(request to accept password from interactive session)
 Password: 
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): atc.corporate.net:443
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None
WARNING: rate-limit response detected; waiting 4 seconds...
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None
WARNING: rate-limit response detected; waiting 8 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: atc.corporate.net
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None
WARNING: rate-limit response detected; waiting 16 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: atc.corporate.net
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None
WARNING: rate-limit response detected; waiting 31 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: atc.corporate.net
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None
WARNING: rate-limit response detected; waiting 32 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: atc.corporate.net
DEBUG:urllib3.connectionpool:https://atc.corporate.net:443 "GET /confluence/rest/api/space?spaceKey=UCAADO&limit=1 HTTP/1.1" 200 None

sphinxcontrib.confluencebuilder error:

---
Request has been rate limited

The configured Confluence instance is reporting that too many requests
are being made and has instructed to client to limit the amount of
requests to make at this time.
---

I think that your solution is raising the ConfluenceRateLimited error too soon. Maybe the server instance includes the Retry-After Header even though the request was ok? I see that the response to the GET requests was a 200, so I don't think that the retry is necessary? This is the code line I am referring to: https://github.com/sphinx-contrib/confluencebuilder/pull/617/files?file-filters%5B%5D=.py&show-viewed-files=true&w=1#diff-82991557e9c5cb1d7f66f271d188ea6f0eaf99c84d7ec17e8182f265b4806e90R209

Thanks again for your help and let me know if yo need more information to fix the issue :)

jdknight commented 2 years ago

@konradbender, ah ok ~ I think I see where I made a mistake. When reading the example on rate limiting, I was assuming failure case was a combination of either the 429 error state or when Confluence reports Retry-After. But I assume (based off your recent observations), Confluence may be reporting a Retry-After value that should be considered if a following request was made on the previous successful one (which makes sense).

I will try to make some adjustments tomorrow, if you're willing to try again.

konradbender commented 2 years ago

@jdknight After your most recent commit, I tried again and it worked flawlessly! It seems like our server instance is "ok" with a frequency of two requests per minute, because the logs indicate that once rate-limitting is detected, the wait time is around 30 seconds. Of course, this can lead to a couple of minutes (maybe even hours) of upload times. That is not a problem for me, but my question is, if anything might time out during the upload, e.g. TCP sessions?

konradbender commented 2 years ago

Anyways, thanks so much for your work. Is there any way I can get notified when this PR makes it into the version available on PyPI?

jdknight commented 2 years ago

It seems like our server instance is "ok" with a frequency of two requests per minute, because the logs indicate that once rate-limitting is detected, the wait time is around 30 seconds.

Two requests per minute seems --- painfully slow. While it is possible that the instance may be configured in that matter, I also wonder if there could be an issue with the introduced rate-limiting implementation. When a rate-limit event is detected, I believe the existing logic code uses a proper delay until a next request; however, there are two limitations:

1) It does not consider any possible Confluence-reported rate limit suggestions until the first 429 error occurs. 2) The historic delay value is scaled down over time which will eventually lead a user back into a 429 error state again.

For the latter, I think the scaling down may be fine/proper, for systems which do not impose any rate-limit hints in the header (e.g. the Confluence server was busy, but may become less busy and more requests can be pushed through -- speculative example). But I wonder if we always tracked and considered the last "retry after" header for all requests that a publish event can be completed faster.

One thing you could try (if interested), is setting the added confluence_publish_delay option in this branch. For example:

confluence_publish_delay = 0.3

This will impose a forced delay on all requests. In theory, if you can tailor it to a value which makes you publishing faster, it should show that we can improve the rate-limiting implementation. The value is in seconds -- and note, it is for each request made to Confluence (space search, page update, page ID fetches, unwatch, etc).

[Can] anything ... time out during the upload, e.g. TCP sessions?

There are three areas where this extension deals with timeout:

1) The confluence_timeout provides a way to configure a timeout on all requests made to the Confluence instance. By default, no value is set so it will default to whatever the Requests library uses -- which I believe is no timeout at all. This is per request; and this extension does not retry on a timeout event. 2) This extension will only retry a publish request once (i.e. two attempts) after three seconds when Confluence reports a critical error (e.g. such as an exception response). 3) The recent rate-limiting logic will only attempt to try a maximum of five times ~ so if Confluence never accepts a request in this situation, the publish event would stop after ~1-2 minutes.

Is there any way I can get notified when this PR makes it into the version available on PyPI?

Typically, after a pull request is merged in, I will flag an issue as "pending next release" but not close it; and eventually close it when the changes make it to a PyPI release. Tracking events on this issue will be best source of information on its availability to a release.

konradbender commented 2 years ago

Ok, I understand.

It turns out that the 30 second wait is not a problem of the server configuration, because if I set confluence_publish_delay e.g. to 5 seconds, then the frequency is more like 3 or 4 requests per minute.

Here are some logs in case those are helpful:

WARNING: unable to build link to document due to missing title (in README): genindex
WARNING: unable to build link to document due to missing title (in README): py-modindex
WARNING: unable to build link to document due to missing title (in README): search
WARNING: unsupported code language for confluence: console
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content?type=page&spaceKey=<space key>&status=current HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net                                                                                                                                                             
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content?type=page&spaceKey=<space key>&title=<titel>&status=current&expand=version%2Cmetadata.labels HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "PUT /confluence/rest/api/content/2024011278?status=current HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "DELETE /confluence/rest/api/user/watch/content/2024011278 HTTP/1.1" 204 0
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011278?status=current&expand=ancestors HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/search?cql=ancestor%3D2023786515&limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 10 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/search?cql=ancestor%3D2023786515&limit=1000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011300/child/attachment?limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 17 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011300/child/attachment?limit=1000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011290/child/attachment?limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 21 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011290/child/attachment?limit=1000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011302/child/attachment?limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 19 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011302/child/attachment?limit=1000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011288/child/attachment?limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 20 seconds...
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011288/child/attachment?limit=1000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
DEBUG:urllib3.connectionpool:https://corporate-network.net:443 "GET /confluence/rest/api/content/2024011281/child/attachment?limit=1000 HTTP/1.1" 429 694
WARNING: rate-limit response detected; waiting 20 seconds...

And it keeps going like that ...

jdknight commented 2 years ago

Thanks for the information.

I have made additional changes to the rate-limiting implementation if you care to try. They are currently on a new branch:

pip install git+https://github.com/sphinx-contrib/confluencebuilder.git@improve-rate-limiting

Make sure to unset/clear the confluence_publish_delay option.

The changes should help enforce rate limiting hints as soon as Confluence (or a proxy) reports to do so. Although, this change only outputs when rate-limiting delays from Confluence only when verbose mode is enabled, so if you want a more detailed output of what is happening, it may be ideal to enable verbosity on the Sphinx call (I believe -vv will do this?).

In theory, if your Confluence instance is rate-limiting you, you will see (with verbose logging) that this extension will report messages such as the following:

rate-limit header detected; waiting <n> seconds...

This should be an indication that the administrator of a Confluence instance has configured rate limiting and there is not much you could do (or a proxy is rate-limiting, but there would be not much difference). In this case, you should not see 429 codes or a "rate-limit response detected" message. If you do see it, this extension may not be properly waiting the requested duration.

Instead, if when attempting publishing you see 429 errors and never a "rate-limit header detected" message:

rate-limit response detected; waiting <n> seconds...

This would make me believe the issue is not that Confluence is rate limiting but something else (like a corporate proxy instance). One thing I find interesting in the logs is the following messages:

DEBUG:urllib3.connectionpool:Resetting dropped connection: corporate-network.net
...

This makes be believe that a (reverse) proxy is being used (which is fine), but keep-alive sessions are not supported. This should not be a major issue, it will only delay publishing a bit longer since it has to make a new connection attempt each request. But what this may hint at is either:

1) The proxy is not forwarding a Retry-After header from Confluence (if rate limiting is even enabled); and/or, 2) The proxy is having issues (overloaded) or limiting these interactions.

But this is speculative. What I find interesting is that if you set your fixed delay of confluence_publish_delay to five seconds (which is extremely large IMHO), I would suspect you should never be getting 429 errors from Confluence. This would either mean that a system administrator has extremely limited the amount of API calls you can make, or a proxy does not like the amount of traffic the host is performing (a combination of a publish event with other things like interact with this instance with a browser).

konradbender commented 2 years ago

Hey there, so it turns out that it must be the problem of a proxy.

I have indeed never gotten a

rate-limit header detected; waiting <n> seconds...

message after using the branch improve-rate-limiting.

I think in that case, this issue can be closes because there is nothing you can do about it, right?

To be honest, we are only using this API as a bridge solution and will migrate to GitHub Pages in the next few months.

Thank you so much for your great support either way :)

jdknight commented 2 years ago

I think in that case, this issue can be closes because there is nothing you can do about it, right?

Unfortunately, that is most likely the case. I'd be curious to know if you ever stumble upon the reason why this may have been happening.

It would also be interesting to know if a publish event reacted differently if you disabled keep-alive flag. For example:

confluence_publish_headers = {
    'Connection': 'close',
}

I would think you would get the same result, but just fishing for anything.

[We] are only using this API as a bridge solution and will migrate to GitHub Pages...

Nice. A static site generator will give you way more flexibility for displaying/interacting with your content. I hope the transition works out well for your team.

I would be also curious to know how you may plan to integrating a search-like functionality across your GitHub pages ~ although, this is going off topic quite a bit, so feel free to ignore.

konradbender commented 2 years ago

Unfortunately, we're going to have to continue using the Confluence publishing method do distribute our documentation for some time. Recently, the workflow was broken by an error in logging. I opened a separate issue: #632.

jdknight commented 2 years ago

v1.8 is now available on PyPI -- marking as closed.