n0s1 confluence scan fails due to socket timeout error

NeoWilson commented 5 months ago

My scan keep fails due to encountering a timeout error. Below is a snippet of the error msg:

... File "/usr/lib64/python3.9/ssl.py", line 1100, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out ... requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='xxxx.xxxx.xxxx.xxxx', port=443): Read timed out. (read timeout=75)

It is assumed that the page is simply too big to complete the scan prior to the connection timeout. May I know if there is a way to extend the duration for timeout as I noticed it will not timeout if I skipped the scans for some parts (E.g: using --skip-comments arg)

blupants commented 5 months ago

Thanks for reporting the issue @NeoWilson

Could you please provide more details on how to replicate the issue? Which platform were you scanning? (e.g. asana_scan, wrike_scan, linear_scan, jira_scan or confluence_scan). Could you please also share which exact input arguments you are using (please exclude any private info such as access tokens)?

n0s1 relies on several other open-source SDKs to pull the data to be scanned from each specific platform. I like the idea of having the timeout as an optional input argument to extend its duration, but we might need to also request that enhancement downstream to n0s1's dependencies.

NeoWilson commented 5 months ago

I am scanning a self-hosted confluence server used by large number of parties, resulting in an enormous number of spaces, pages and contents. I noticed the timeout consistently occur in 1 of the confluence spaces, which is avoided if I were to use the "--skip-comments" arg though it will still eventually occur consistently in another larger space. This led me to assume it is due to the size of the spaces that result in the timeout.

My command is as follows: n0s1 confluence_scan --server "" --api-key "" --skip-comments --report-file report.json --report-format gitlab

So I was wondering if currently there is a specific argument available in n0s1 tool or is there something I can change in the n0s1's dependencies in order to remove or extend the timeout duration?

blupants commented 5 months ago

Thank you for providing the details @NeoWilson .

I have created a branch bp-timeout to add the "--timeout" and also a "--limit" input parameters:

  --timeout [TIMEOUT]   HTTP request timeout in seconds
  --limit   [LIMIT]     The limit of the number of pages to return per HTTP request

Unfortunately, I don't have a huge Confluence instance to validate it. Could you please run it from the source and let me know if it fixes your issue? If it does fix it, I will release a new version 1.0.18 that will include the new input parameters. In order to run the code from my new branch using the new parameters, please try something in like this:

git clone git@github.com:spark1security/n0s1.git --branch bp-timeout
cd n0s1/src/n0s1
python3 -m venv n0s1_python
source n0s1_python/bin/activate
python3 -m pip install -r ../../requirements.txt
python3 n0s1.py confluence_scan --server "" --api-key "<API_TOKEN>" --report-file report.json --report-format gitlab --timeout 120 --limit 10
deactivate

You may want to tweak both "--timeout" and "--limit" because it maybe a good a idea to have lower number of items returned per HTTP response when dealing with massive Confluence content like yours.

The default timeout from the Confluence SDK is 75s. The default limit is 50.

NeoWilson commented 5 months ago

Thanks for the new input parameters. I am currently testing out the new parameters for my huge Confluence instance. It is observed that even after setting the timeout to 30mins and limit to 5, it will still get timeout consistently at certain huge confluence pages. (though the scan did progress further as i continue to tweak the 2 input parameters such as increasing the timeout duration and lowering the limit per HTTP response)

Currently, I have 2 more queries:

Is there a maximum limit to the timeout duration?
Will there be an option to skip confluence pages that timeout? (This is because I noticed due to timeout, my scanning will stop abruptly and unable to proceed to scan subsequent possibly smaller confluence pages that are listed after the exceptional large confluence page that caused the timeout error. In addition, there will be no scan report of the confluence pages that have been scanned before the timeout error. These issues result in the requirement to constantly restart the scan from the beginning which is very time-consuming)

blupants commented 5 months ago

Thanks for the feedback. I have included a better error handling to the "bp-timeout" branch.

I don't think having extremely high timeouts would be a feasible approach even if the server/SDK supported it. I think the best approach is to handle the timeout exception locally and move on to the next page (in other words, your suggestion of skipping timeout out pages should be the default behavior). That is exactly what I have just included to the "bp-timeout" branch.

Could you please retest the new implementation starting with lower timeouts and limits. Whenever a timeout occurs, you should get an alert message and the pagination should move on. That way you should always get a report at the end of a n0s1 scan run.

By monitoring the resulting scan report and the alert messages from the exceptions handled locally, you should be able to increment your timeout and limit and find a balance between a scan speed vs. accuracy.

Could you please test the latest implementation by doing a "git pull -r" and re-running the scans. It should be something somewhat like this:

cd n0s1
git checkout bp-timeout
git pull -r
cd src/n0s1
source n0s1_python/bin/activate
python3 n0s1.py confluence_scan --server "" --api-key "<API_TOKEN>" --report-file report.json --report-format gitlab --timeout 60 --limit 5
deactivate

NeoWilson commented 5 months ago

Thank you for your quick update. I will continue to test on my end.

Additionally, I would also like to ask if the skipped pages due to timeout will be logged in the scan report? It is so that we can keep a record of which pages wasn't scanned due to timeout error.

blupants commented 5 months ago

That's a great suggestion, but unfortunately we can implement it at n0s1's level.

We can't tell for sure what exact page the timeout occurred. Best case scenario we would know the space, and the page range where the timeout occurred (For instance, assuming --limit=30, we would know that a timeout occurred on space "MARKETING" between pages 60 and 90).

That is because n0s1 uses the function get_all_pages_from_space() from atlassian-python-api. If you look on the source code where n0s1 calls it, we would only be able to log the space (a.k.a. key), the start (a.k.a. pages_start) and limit (a.k.a. pages_limit).

Please, take a look at this other discussion, and let me know if the new parameters for filtering out spaces would be helpful for your us case as well.

NeoWilson commented 5 months ago

I took a look at the other discussion and unfortunately, my confluence server has too many projects, spaces and pages to filter out specific projects, spaces or pages.

But from what I am understanding for the issue, if I set "--limit = 1", does this means I would know the timeout occur in which specific space and page and would it be able to log it in the scan report?

NeoWilson commented 5 months ago

Update: With the additional parameters (--timeout and --limit) as well as the auto-skip function once timeout, I managed to successfully complete my scan in a few hours and managed to acquire the scan report. I will continue to adjust the timeout and limit in order to avoid skipping too many pages in order to acquire a more accurate and exhaustive scan report.

A suggestion to improve the tool would be to be able to log down which confluence's space and page was skipped due to timeout in the scan report as it will greatly help us narrow down what pages are still not scanned. (assuming --limit = 1)

blupants commented 5 months ago

Thanks for the suggestion @NeoWilson . I have added a log file (n0s1_debug.log) that gets created when using the "--debug" input argument.

Could you please run a test similar to this:

cd n0s1
git checkout bp-timeout
git pull -r
cd src/n0s1
source n0s1_python/bin/activate
python3 n0s1.py confluence_scan --server "" --api-key "<API_TOKEN>" --report-file report.json --report-format gitlab --timeout 5 --limit 1 --debug
deactivate

After the scan finishes you can check the n0s1_debug.log log file and it will have messages like this for all skipped pages due to a timeout:

HTTPSConnectionPool(host='spark1us.atlassian.net', port=443): Read timed out. (read timeout=5)
HTTPSConnectionPool(host='spark1us.atlassian.net', port=443): Max retries exceeded with url: /wiki/rest/api/content?spaceKey=IS&start=2&limit=4&type=page (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x10dd67050>: Failed to resolve 'spark1us.atlassian.net' ([Errno 8] nodename nor servname provided, or not known)"))

In my case, I check the log messages and use the "url:" field to visit the wiki page(s) that has been skipped. For instance: https://spark1us.atlassian.net/wiki/rest/api/content?spaceKey=IS&start=2&limit=4&type=page.

NeoWilson commented 5 months ago

Thanks for the update. I am alright with this arrangement. I would like to confirm that the timeout error will be logged in the debug logs as I am not seeing the errors in my debug log despite expecting timeout error for some pages when I was able to complete the scan successfully.

If it is confirmed to be properly logged in the debug log, I do not have further questions and you may close this issue after adding the features we discuss here into a new release of this n0s1 tool.

blupants commented 5 months ago

Please be aware that only the Confluence scan supports the exceptions/timeout logging currently. Does the file n0s1_debug.log get generated at all? If yes, can you search for "HTTPSConnectionPool"? Do you get any results?

Could you please double-check check you ran git checkout bp-timeout and git pull -r before running n0s1? If you were able to properly pull my latest changes, you should have a new file "n0s1src/n0s1/controllers/hollow_controller.py" in the folder you cloned the source code.

The latest change I have added to bp-timeout branch should save all exceptions (such as "requests.exceptions.ReadTimeout: HTTPSConnectionPool") to n0s1_debug.log.

NeoWilson commented 5 months ago

I confirmed that the n0s1_debug.log file is generated. I also confirmed that the hollow_controller.py is present in my folder. However, I am not seeing any exceptions errors (including timeout) in the debug log file.

So I am unable to confirm whether the exceptions are being logged in the debug log file or it is simply because there is no longer any timeout errors encountered by my scan with my confluence data. (Since i noticed my scan is successful when i set --timeout as default and --limit to 1)

blupants commented 5 months ago

Thank you for your feedback. During my local test one way I found to "force a timeout" and get an exception, was to start the scan with a timeout of let's say 5 seconds. Then, while it was running I would disable the WiFi connection in my machine for about 10 seconds and reconnect the network. By doing that I could see some exceptions being logged on the n0s1_debug.log file after the scan finished.

Also, keep in mind that the log file keeps appending data for each run indefinitely. You may want to rename/delete it before each test.

You may want to try "forcing a timeout" to confirm whether or not the new source code is indeed logging exceptions.

I will be including everything we have done so far in n0s1's next release. If in the meantime you find any other issues/suggestion, please keep me posted.

spark1security / n0s1

n0s1 confluence scan fails due to socket timeout error #12