jborean93 / smbprotocol

Python SMBv2 and v3 Client
MIT License
316 stars 73 forks source link

filedescriptor out of range in select() #219

Open DragosFlorea opened 1 year ago

DragosFlorea commented 1 year ago

Hi I'm getting quite often, recently, the following error

ERROR:grpc._server:Exception calling application: filedescriptor out of range in select() 2023-06-09T11:25:23.748090826Z Traceback (most recent call last): 2023-06-09T11:25:23.748095995Z File "/usr/local/lib/python3.8/site-packages/grpc/_server.py", line 494, in _call_behavior 2023-06-09T11:25:23.748101179Z response_or_iterator = behavior(argument, context) 2023-06-09T11:25:23.748115157Z File "/src/grpc_servicers/smb/smb_operations_servicer.py", line 20, in Rename 2023-06-09T11:25:23.748120974Z return self.operationsService.rename(request) 2023-06-09T11:25:23.748125750Z File "/src/services/operations_service.py", line 137, in rename 2023-06-09T11:25:23.748130822Z session = register_session(request.share.Server, 2023-06-09T11:25:23.748135742Z File "/usr/local/lib/python3.8/site-packages/smbclient/_pool.py", line 408, in register_session 2023-06-09T11:25:23.748140974Z connection.connect(timeout=connection_timeout) 2023-06-09T11:25:23.748145856Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 880, in connect 2023-06-09T11:25:23.748151036Z smb_response = self._send_smb2_negotiate(dialect, timeout, enc_algos, sign_algos) 2023-06-09T11:25:23.748156120Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 1604, in _send_smb2_negotiate 2023-06-09T11:25:23.748161357Z request = self.send(neg_req) 2023-06-09T11:25:23.748166299Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 965, in send 2023-06-09T11:25:23.748171494Z return self._send( 2023-06-09T11:25:23.748176282Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 1291, in _send 2023-06-09T11:25:23.748181409Z self._check_worker_running() 2023-06-09T11:25:23.748186303Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 1174, in _check_worker_running 2023-06-09T11:25:23.748191517Z raise self._t_exc 2023-06-09T11:25:23.748196333Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/connection.py", line 1304, in _process_message_thread 2023-06-09T11:25:23.748201412Z b_msg = self.transport.recv(600) 2023-06-09T11:25:23.748206310Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/transport.py", line 118, in recv 2023-06-09T11:25:23.748211311Z b_packet_size, timeout = self._recv(4, timeout) 2023-06-09T11:25:23.748221579Z File "/usr/local/lib/python3.8/site-packages/smbprotocol/transport.py", line 140, in _recv 2023-06-09T11:25:23.748226977Z read = select.select([self._sock], [], [], max(timeout, 1))[0]

Not sure what triggers this error currently and I do not have a way to reproduce it. Do you have any idea what could cause that?

Thanks

adiroiban commented 1 year ago

Hi Dragoș

Can you share the code that you use and that is triggering the error?

It's important to have a complete , minimal and functional example that someone else can use to try to reproduce the error.

What SMB server do you use?

Cheers

DragosFlorea commented 1 year ago

Hi As an overview I am making grpc calls to python service to do stuff on a share(discover files, rename/delete, upload/download).

(Maybe this helps) Not sure if you are familiar with grpc there some types of requests were a connection is opened between 2 service and the data is streamed. Using this for upload and download to stream the content of file in bytes array.

def rename(self, request) -> objects_operations_pb2.RenameResponse:
        connection_cache = {}
        errorMessage = ''
        session = {}
        try:
            session = register_session(request.share.Server,
                                       username=request.share.User,
                                       password=request.share.Password,
                                       connection_cache=connection_cache)
            self.logger.info("Session Id (rename) was opened:  %s" % session.session_id)
            oldFilePath = ntpath.join(request.share.SharePath, request.objectPath)
            newFilePath = self.__get_new_file_path(oldFilePath, request.newObjectName)
            try:
                file = self.__apply_rename(oldFilePath, newFilePath, connection_cache)
            except SMBOSError as ex:
                # if file already exists at the location, rename will throw error. To fix this we delete file and retry rename
                if (ex.strerror == 'File exists'):
                    print('File ' + newFilePath + ' already exists. The existing file will be overwritten.')
                    remove(newFilePath, connection_cache=connection_cache)
                    file = self.__apply_rename(oldFilePath, newFilePath, connection_cache)
                else:
                    raise ex
            return objects_operations_pb2.RenameResponse(isSuccessful=True, grpcObject=file)
        except SMBResponseException as ex:
            errorMessage = 'Error(rename): ' + ex.message
            self.logger.error(errorMessage, exc_info=True)
        except SMBOSError as ex:
            errorMessage = 'Error(rename): ' + ex.strerror + '(' + ex.filename + ')'
            self.logger.error(errorMessage, exc_info=True)
        except SMBAuthenticationError as ex:
            errorMessage = 'Error(rename): ' + ex.args[0]
            self.logger.error(errorMessage, exc_info=True)
        except BrokenPipeError as e:
            errorMessage = 'Error(rename): BrokenPipe error, smb connection failed'
            self.logger.error(errorMessage, exc_info=True)
        finally:
            self.closeConnection(session, connection_cache, 'rename')
        return objects_operations_pb2.RenameResponse(isSuccessful=False, errorMessage=errorMessage)`

    def closeConnection(self, session, connection_cache, methodName) -> None:
        if(session):
            self.logger.info(
                "Session Id ({0}) with connection status {1} is closing.({2})".format(session.session_id, session._connected, methodName))
        reset_connection_cache(connection_cache=connection_cache)
        if(session):
            self.logger.info(
                "Session Id ({0}) with connection status {1} was closed.({2})".format(session.session_id, session._connected, methodName))

In general all the methods used have the same structure A register_session in the beginning Some logic in the middle And a reset_connection_cache at the end

Unfortunately I do not have smb server version... Also I'm using this library for quite some time and didn't see this issue before, not sure if something changed. I've upgraded recently the smb lib from 1.5 ish to version >=1.9.1

Thanks

Update: Smb server version is V2

jborean93 commented 1 year ago

Are you potentially running a lot of operations or doing things in parallel with multiple threads? Reading online the filedescriptor out of range in select() might be due to a lot of select() calls and the socket fd going beyond 1024 https://stackoverflow.com/questions/14250751/how-to-increase-filedescriptors-range-in-python-select.

DragosFlorea commented 1 year ago

There is possibility to have multiple threads, 10 maybe a little bit more. It is very strange because we didn't encounter this until now and basically the same setup is for years from code perspective... So either the environment maybe is the problem or something changed somewhere... So if we try to limit the threads? could this be a workaround? Or maybe https://www.cyberciti.biz/faq/how-to-find-ulimit-for-user-on-linux/ this can help?

jborean93 commented 1 year ago

To test whether it’s a change on smbprotocols end you can just revert back to an older version that you know worked. AFAIK I haven’t touched the raw socket code for a while now but I can’t guarantee that without having a closer look. You can certainly try increasing the ulimit but it sounds like more of a fundamental problem with select and the “fix” is to use poll on Linux which is a pretty big change.

adiroiban commented 1 year ago

This looks like an ulimit case :)

Like suggested by Jordan, try to use the old code and see if you still get these errors.

Were there any OS update on your side recently

Check out http://www.sscce.org/ and see if you can create the simplest code possible to reproduce the error.

This is something that I use to troubleshoot transient errors. It should help you isolate the error.

And the simple code can also help to find the root issue and test the fix :)

good luck

DragosFlorea commented 9 months ago

Hi I've managed to mitigate the issue with a session manager to hold just a number of connections opened. but because I use grpc communication I still encounter the error occasionally because grpc creates file descriptors and often are more then 1024 ("select" method limitation) and when i try to open a new smb connection it will trigger the error again. Grpc lib handles in some other way the files descriptors and I did not encountered the issue there

There is chance to not use "select" anymore?

jborean93 commented 9 months ago

The "fix" here is to avoid select on non-Windows and use poll/epoll/kqueue depending on the platform used. Unfortunately this isn't the simplest thing to do right now but I do plan on getting to it at some point with the socket refactor work I've got going in the background.