jborean93 / smbprotocol

Python SMBv2 and v3 Client
MIT License
318 stars 73 forks source link

RFE: Non-singleton ClientConfig? #148

Open grawity opened 2 years ago

grawity commented 2 years ago

When using smbclient/smbprotocol from a module, it would be good if configuration (such as setting the authentication mechanism) could be scoped to a "client" object instead of always making a global change. Such as:

client = smbclient.Client(auth_protocol="kerberos")
x = client.listdir(r"\\foo\bar\baz")

Currently all the modules are internal-use and all configuration is the same (Kerberos everywhere), but it nevertheless feels untidy to reconfigure the entire smbclient module every time.

(The existing module-level methods including ClientConfig() could still operate on a global singleton "client", perhaps similar to how python-requests has both a global requests.get() and a scoped requests.Session().get().)

jborean93 commented 2 years ago

I've talked about this recently with https://github.com/jborean93/smbprotocol/pull/141#issuecomment-939569733 which sums up my current thought on something like this. When implementing the ClientConfig it was designed as a way to configure the Python client with sane defaults which tries to replicate how Windows works as a client. The key things IMO were:

Making a non-singleton and exposing the functions on that config object can definitely be done but I've yet to be convinced that it brings a lot of benefits over what can be done today. From what I can see, these functions by themselves take in the same kwargs to control things like the username/password/etc and you could easily do something like:

import smbclient

creds = {"username": "foo", "password": "bar"}
smbclient.listdir(r"\\foo\bar\baz", **creds)

The current disadvantages of the current approach I can see

Now what I see the disadvantages of the proposed approach

In the end the only real compelling reason I can see is to make the connection cache management a bit easier and less error prone but I could also see that being done by making the connection cache it's own class and have it live outside of a "client". That's not to say my mind is set in stone, I'm happy to be convinced otherwise I just thought I would share why I'm currently opposed to this idea.

grawity commented 2 years ago

When implementing the ClientConfig it was designed as a way to configure the Python client with sane defaults which tries to replicate how Windows works as a client

It makes sense with "replicating how Windows works" as a goal, but I'd probably question whether the goal itself is worthwhile; the Windows SMB client is convenient for day-to-day use, but its global credentials cache is not particularly script-friendly even on Windows itself (e.g. needing temporary net use that needs to be manually torn down later, or even runas /netonly, whenever non-default credentials are wanted).

Compare to how python-gssapi now has gssapi.Credentials so that the process-wide KRB5CCNAME or KRB5_KTNAME wouldn't need to be touched (and even having those is already an improvement).

Note I'm not asking for each individual connection to be a separate object, but rather for something that encapsulates the entire state (all the things you mention – default credentials, connection cache, DFS, etc).

From what I can see, these functions by themselves take in the same kwargs to control things like the username/password/etc and you could easily do something like:

Hmm, true, although (besides not being pretty) I'm not sure how that interacts with connection caching? That is, if I open a file with some explicit credentials specified, would all further accesses to that \\host\share continue to implicitly use the same credentials?

(To be honest it wouldn't make any difference in my case, since the tools use environment credentials anyway, but it would be a bit unexpected in general. Indeed even the global ClientConfig isn't exactly causing problems for me in the current project, as there's still only one module that uses SMB – global configuration just rubs me the wrong way whenever I need to use it. For example, I feel the need to call ClientConfig() at the beginning of every function that invokes smbclient, "just in case" another module has configured it differently in between calls.)

And yes, it would be good to have auth_protocol= exposed through kwargs if this is the way to do things. (My goal is to make all the tools I'm writing run exclusively on Kerberos and never show exceptions that would point a future sysadmin towards configuring NTLM_USER_FILE or similar.)

  • It exposes yet another way to call these functions
    • Could lead to confusion over which method to use for new users to the library

To be honest I don't really buy the "confusion" argument – IMO, it can be equally confusing that smbclient doesn't behave like other Python "client" modules (e.g. the SMTP, FTP, MySQL, etc. clients);

I think python-requests has a good model, where module-level functions have no global state (not even connection caching/reuse), which usually makes a simple decision: for one-off calls you'd use a bare module function (with configuration via kwargs), but for repeated calls you'd use a Session method.

(Another similarity is that requests.Session still accepts full URLs, just like smbclient accepts UNC paths, but it encapsulates the state so that different callers can easily use different parameters.)

Could result in name conflicts if there are any overlaps with the os and shutil functions

Not entirely sure what you mean by this?

jborean93 commented 2 years ago

but its global credentials cache is not particularly script-friendly even on Windows itself (e.g. needing temporary net use that needs to be manually torn down later, or even runas /netonly, whenever non-default credentials are wanted).

I do agree in this sense, especially since you can only really store 1 credential for a host. This is part of why I set up the code to consider sessions (the part that handles authentication) unique per connection and username. This allows someone to rely on the normal caching mechanism with a default global but also explicitly say I want a brand new session with this username. What it does not allow you to do is open a brand new session for the same user without opening a new connection in a different cache though. I'm not sure whether this last bit is a valid scenario though, or at least I cannot think of why someone would want this.

Compare to how python-gssapi now has gssapi.Credentials so that the process-wide KRB5CCNAME or KRB5_KTNAME wouldn't need to be touched (and even having those is already an improvement).

You can do this today without any special handling. You can call kinit, or retrieve the TGT in any mechanism available to you and use that globally thoughout the process. This could be by setting those env vars when running smbprotocol and the underlying auth will factor those in when getting the session ticket for the target server. Windows is a bit more difficult because now you are down to the limitations of those scenarios and how Windows retrieves credentials for target servers but ultimately this IMO is out of scope for smbprotocol.

Hmm, true, although (besides not being pretty) I'm not sure how that interacts with connection caching? That is, if I open a file with some explicit credentials specified, would all further accesses to that \host\share continue to implicitly use the same credentials?

The way the caching works is that it caches 3 things

Consider the following code:

import smbclient

smbclient.listdir(r"\\server1\share1")
smbclient.listdir(r"\\server1\share2", username="user")
smbclient.listdir(r"\\server2\share")

At the end you will have the following

IMO this is pretty flexible as it gives you that global default but it also lets you define an explicit credential as required which goes above what Windows offers.

Not entirely sure what you mean by this?

Currently there are 3 main modules behind smbclient. Currently there's the _os ones which are exposed directly, shutil and path. If these functions were to be behind a client like object I would have to make sure there are no functions that are duplicated in name. In fact there is already a conflict, smbclient.copyfile](https://github.com/jborean93/smbprotocol/blob/b8e19b5574a34e487ddefd717bb522891bd663b6/smbclient/_os.py#L131-L194) and [smbclient.shutil.copyfile](https://github.com/jborean93/smbprotocol/blob/b8e19b5574a34e487ddefd717bb522891bd663b6/smbclient/shutil.py#L128-L210). Theoslike functions are designed to be pure SMB operations whereas theshutil` ones are designed to wrap them and provide a cross filesystem compatibility. If we decide to put things in a client class we would have to figure out how to expose these without name collisions.

I wouldn't consider this a blocker but something that needs to be considered.

global configuration just rubs me the wrong way whenever I need to use it. For example, I feel the need to call ClientConfig() at the beginning of every function that invokes smbclient, "just in case" another module has configured it differently in between calls.)

I will concede this point and I do think something probably needs to be done about this to make handling non-global caches/clients a bit easier than it is today. How that looks is the part I'm not 100% sure about. I can think of 3 ways so far how this could potentially be done:

Client class

This is what you've suggested where a user can initialise a client like object with the various config options and call methods on that client like so:

import smbclient

with smbclient.Client(...) as client:
    with client.open_file(...) as fd:
        ...

    client.listdir(...)
Advantages Disadvantages
No extra kwarg, the context is with the class Another way of using smbclient, I personally would try to deprecate the existing one if this eventually happens
Good opportunity to add type annotations to the functions/review existing API Need to deal with function name collisions

Client kwarg

This is similar to the above but the existing framework is kept and there's only an extra kwarg added.

import smbclient

with smbclient.Client(...) as client:
    with smbclient.open_file(..., client=client) as fd:
        ...

    smbclient.listdir(..., client=client)
Advantages Disadvantages
The current API doesn't really change, just an extra kwarg Another kwarg with some overlap of existing ones like username, password
No duplication of work to replicate each function with the client class
Can be extended with src_client, dst_client for functions that talk to 2 endpoints like shutil.copy

Client thread context

This is somewhat an extension of the global scope but exposes a way to create a thread local "config" that is used. There would be logic that checks whether the SMB call is being done under an smbclient.Client context and instead of using the global cache it will use the "activated" one.

import smbclient

with smbclient.Client(...) as client:
    with smbclient.open_file(...) as fd:
        ...

    smbclient.listdir(...)
Advantages Disadvantages
Less disruptive to callers, essentially the same API setup Can get confusing if you have multiple levels nested
No extra kwarg that needs to be placed on every call Trade global for a slightly less global setup
No real changes need to be made to every function, everything is implemented internally I don't know whether this can scale for asyncio if that ever gets offered in the future
It is at least thread safe

If I was to pick today I would be leaning more towards the kwarg option as it mostly fits in with the existing API but I'm not totally sold on it. If you have any other pros/cons/ideas I'm more than happy to add it to the current list.

dolfinus commented 2 years ago

Almost all other FS clients are imlemented as a client object, like boto3 (S3), paramiko (SFTP/SCP), ftplib (FTP, FTPS), hdfscli (HDFS), and even tar module from stansart library. IMHO, this is the only way to make a flexible and straightforward client which will not use any kind of global session

dimon222 commented 1 year ago

Any workaround available right now? I'm not interested to deal with global exposure of connection sessions on my multitenant environment

hb2638 commented 1 week ago

There is a way to achieve this with the current version of the code

  1. You create your own connection_cache
  2. Create a connection with a session and add the connect to the cache
  3. Pass the connection cache, port and username to all functions in the smbclient modules which causes it to use the session you created
import contextlib
import io
import logging
import ntpath
import typing
import uuid

import smbclient
import smbclient.shutil
import smbprotocol
import smbprotocol.connection
import smbprotocol.exceptions
import smbprotocol.file_info
import smbprotocol.open
import smbprotocol.session
import smbprotocol.tree
from smbprotocol.header import NtStatus

log = logging.getLogger(__name__)

class SmbClient(contextlib.AbstractContextManager):

    def __init__(
            self,
            *,
            client_guid: uuid.UUID=None,
            remote_server_name: str,
            port: int=445,
            user_name: str,
            password: str,
            share_name: str,
    ):
        if client_guid is None:
            client_guid = uuid.uuid4()
        self._conn = None
        self._session = None
        self._share_name = rf"\\{remote_server_name}\{share_name}"
        try:
            self._conn = smbprotocol.connection.Connection(client_guid, remote_server_name, port)
            self._conn.connect()

            self._session = smbprotocol.session.Session(self._conn, user_name, password, auth_protocol="ntlm")
            self._session.connect()

            self._conn_cache = {f"{remote_server_name.lower()}:{port}": self._conn}
        except Exception:
            self._disconnect(connection=self._conn, session=self._session)
            raise

    @staticmethod
    def _disconnect(
            *,
            connection: smbprotocol.connection.Connection|None=None,
            session: smbprotocol.session.Session|None=None,
    ) -> None:
        try:
            if session is not None:
                session.disconnect()
        finally:
            if connection is not None:
                connection.disconnect()

    def _to_full_qualified_path(self, path: str) -> str:
        if path.startswith("/") or path.startswith("\\"):
            path = path[1:]
        return rf"{self._share_name}\{path}"

    def is_dir(self, path: str) -> bool:
        fully_qualified_path = self._to_full_qualified_path(path)
        return smbclient.path.isdir(fully_qualified_path, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)

    def mkdir(self, path: str, parents: bool=False, exist_ok: bool=False) -> None:
        log.debug(f"Making directory [{path}].")
        if exist_ok and self.path_exists(path) and self.is_dir(path):
            log.debug(f"Skipping... Directory [{path}] already exists.")
            return
        fully_qualified_path = self._to_full_qualified_path(path)
        try:
            smbclient.mkdir(fully_qualified_path, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)
        except smbprotocol.exceptions.SMBOSError as e:
            match e.ntstatus:
                case NtStatus.STATUS_OBJECT_PATH_NOT_FOUND if parents:
                    parent = ntpath.normpath(path).split("\\")[:-1]
                    if not parent:
                        raise
                    parent = "\\".join(parent)
                    log.debug(f"Creating parent directory [{parent}].")
                    self.mkdir(parent, parents=True, exist_ok=True)
                    self.mkdir(path, parents=False, exist_ok=exist_ok)
                case NtStatus.STATUS_OBJECT_NAME_COLLISION if exist_ok and self.isdir(path):
                    log.debug(f"Directory [{path}] already exists.")
                case _:
                    raise

    def path_exists(self, path: str) -> bool:
        fully_qualified_path = self._to_full_qualified_path(path)
        return smbclient.path.exists(fully_qualified_path, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)

    def open_file(self, path: str, mode: str="r") -> io.FileIO:
        log.debug(f"Opening [{path}].")
        fully_qualified_path = self._to_full_qualified_path(path)
        return smbclient.open_file(fully_qualified_path, mode, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)

    def read_file_bytes(self, path: str) -> bytes:
        log.debug(f"Reading from [{path}].")
        with self.open_file(path, "rb") as f:
            return f.read()

    def write_file_bytes(self, path: str, data: bytes, parents: bool=False) -> None:
        log.debug(f"Writing to [{path}].")
        try:
            with self.open_file(path, "wb") as f:
                return f.write(data)
        except smbprotocol.exceptions.SMBOSError as e:
            match e.ntstatus:
                case NtStatus.STATUS_OBJECT_PATH_NOT_FOUND if parents:
                    parent = ntpath.normpath(path).split("\\")[:-1]
                    if not parent:
                        raise
                    parent = "\\".join(parent)
                    log.debug(f"Creating parent directory [{parent}].")
                    self.mkdir(parent, parents=True, exist_ok=True)
                    self.write_file_bytes(path, data)
                case _:
                    raise

    def rmtree(self, path: str, ignore_errors: bool=False, on_error: typing.Callable[[Exception], None]=None) -> None:
        log.debug(f"Removing directory [{path}].")
        fully_qualified_path = self._to_full_qualified_path(path)
        smbclient.shutil.rmtree(fully_qualified_path, ignore_errors=ignore_errors, onerror=on_error, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)

    def unlink(self, path: str, missing_ok: bool=False) -> None:
        log.debug(f"Deleting [{path}].")
        fully_qualified_path = self._to_full_qualified_path(path)
        try:
            smbclient.unlink(fully_qualified_path, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)
        except smbprotocol.exceptions.SMBOSError as e:
            match e.ntstatus:
                case NtStatus.STATUS_OBJECT_NAME_NOT_FOUND if missing_ok:
                    pass
                case _:
                    raise

    def list_directory_file_names(self, path: str) -> list[str]:
        fully_qualified_path = self._to_full_qualified_path(path)
        return smbclient.listdir(fully_qualified_path, connection_cache=self._conn_cache, port=self._conn.port, username=self._session.username)

    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
        self._disconnect(connection=self._conn, session=self._session)

Example of creating a file

import smb

conn = smb.SmbClient(
remote_server_name="localhost",
user_name="foo",
password="bar",
share_name="root",
)
conn.write_file_bytes(rf"foo\bar\foobar.bin", b"1234", parents=True)