Customizable receive timeout for SMB Connection

leonnis12 commented 3 months ago

This PR extends the configurability of the SMB Connection class and register_session methods to allow modifying the transport.recv timeout used for both keep-alive checks that keep Windows from closing a connection at ~16 minutes, and for checking the health of the connection.

In the usecase of my application, the SMB connection is used over an unstable network, which sometimes loses the connections between the machines. This triggers Connection timed out errors from the health check implemented in _process_message_thread, but only after 10+ minutes (the hardcoded timeout) where the application hangs. I would like to be able to configure this timeout to be smaller for use cases like this, to perform the health check more often, and be able to handle the timeout error in a reasonable ammount of time.

Current behaviour:

The connection is checked every 10 minutes using a SMB2_ECHO command.
If the connection is broken due to a network problem, the connection will be stuck until the timeout passes, the ECHO command is sent and the server does not respond to it.

Expected Behavior

The timeout should be configurable for the cases where network errors are common, and handling them often requires waiting 10+ minutes for each such problem.

This might also be a fix for https://github.com/jborean93/smbprotocol/issues/117 (mentions a similar problem), by allowing configuring a smaller timeout value, which does not keep the application stuck for long and allows the error to be handled in my application in a timely manner.

The PR extends the health checks added in https://github.com/jborean93/smbprotocol/pull/135 with the configurable health check interval value.

DragosFlorea commented 3 months ago

This would be very helpful for my use case as well

jborean93 commented 3 months ago

Thanks for the PR.

I am somewhat weary about exposing the retry mechanism timeout as a public API as I feel like the implementation is a bit of a bandaid over the real problem rather than a good solution. I know I've been dragging my heels on trying to rework the TCP transport side where my aim was to improve this situation but I'm on the fence about accepting this type of change.

The proper solution in my mind is to figure out how to detect when the socket has been "dropped" but potentially in your situation the client still thinks its connected and thus will wait for the full timeout.

DragosFlorea commented 3 months ago

I understand your concern... And yes the best solution is to know exactly when the connection has been dropped but a logic on the client side to cut off the connection after timeout is not bad either. I mean you can have this as backup if you cannot detect the dropped connection If you do not want to expose the timeout as a public api, which is a low level configuration that can mess up things, what do you say about a environment variable marked as experimental until you have a better solution?

jborean93 commented 3 months ago

I think I can justify an environment variable here, especially if it's helping in some situations. I would have to stress it wouldn't be covered under the public API and could change/be removed in the future.

The socket stuff can be complicated, if you know the reason why the socket was closed but never reported back to the smbprotocol then I'll love to see a reproducer for it. That'll help me test out new scenarios in the future and hopefully avoid the need for the timeout altogether.

leonnis12 commented 3 months ago

I agree that the proper solution would be to detect when the socket has been "dropped", but also that the environment variable "fix" should be enough to help in the situations where this happens, until a proper solution can be found at least.

I've updated the pull request to not expose the timeout through the public API, and instead be available to be overridden through an environment variable marked as experimental from the Connection class.

Regarding reproducing the issue, simulating a network that drops packets from time to time, or has random response time spikes should be enough to trigger this issue, for further testing in the future. The following is an example from the network where I encountered the issue constantly. pings

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.04%. Comparing base (2ce49ef) to head (8e16e2b). Report is 1 commits behind head on master.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #285 +/- ## ======================================= Coverage 99.04% 99.04% ======================================= Files 24 24 Lines 5115 5116 +1 ======================================= + Hits 5066 5067 +1 Misses 49 49 ``` | [Flag](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | Coverage Δ | | |---|---|---| | [macOS](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `68.15% <100.00%> (+<0.01%)` | :arrow_up: | | [py3.10](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.00% <100.00%> (+<0.01%)` | :arrow_up: | | [py3.11](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.00% <100.00%> (+<0.01%)` | :arrow_up: | | [py3.12](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.00% <100.00%> (+<0.01%)` | :arrow_up: | | [py3.8](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.00% <100.00%> (+<0.01%)` | :arrow_up: | | [py3.9](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.04% <100.00%> (+<0.01%)` | :arrow_up: | | [ubuntu](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `96.81% <100.00%> (+<0.01%)` | :arrow_up: | | [windows](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `98.96% <100.00%> (+<0.01%)` | :arrow_up: | | [x64](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `99.04% <100.00%> (+<0.01%)` | :arrow_up: | | [x86](https://app.codecov.io/gh/jborean93/smbprotocol/pull/285/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean) | `98.96% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jordan+Borean#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jborean93 commented 3 months ago

I'll need to sort out the CI problems on macOS separately, I believe the image version now uses the arm64 builds so the x86_64 ones are failing without further work needed. That's not your problem to deal with so thanks for the changes and agreeing to the env var solution for now!

leonnis12 commented 3 months ago

Would it be possible to make a minor release which includes this fix, until the changes for 1.14 are ready? It would be really helpful.

jborean93 commented 3 months ago

v1.14.0 has been published on PyPI with this change https://pypi.org/project/smbprotocol/1.14.0/.

jborean93 / smbprotocol