Open heijligers opened 10 months ago
I would highly recommend you use the high level API, specifically smbclient.scandir
to enumerate entries on a directory. There's not too much that you really gain by using the low level API here as I've tried to make the high level one as efficient as possible for the operations needed. Even just things like opening a file/directory can be done with the high level API and then using the raw file open object can be used for low level operations that might not be exposed in the high level API.
Ultimately I can't help you write your actual application, I can help if you have specific questions about smbprotocol
that you may have but that's about it. If you don't have a specific question or query then I'll close this issue tomorrow.
Thanks for your response. Does the high level api support using a filter pattern? Getting the top level folder share listing takes 30+ minutes as it contains tens of thousands of folders.
Thank you
On Tue, 5 Dec 2023 at 09:26, Jordan Borean @.***> wrote:
I would highly recommend you use the high level API, specifically smbclient.scandir to enumerate entries on a directory. There's not too much that you really gain by using the low level API here as I've tried to make the high level one as efficient as possible for the operations needed. Even just things like opening a file/directory can be done with the high level API and then using the raw file open object can be used for low level operations that might not be exposed in the high level API.
Ultimately I can't help you write your actual application, I can help if you have specific questions about smbprotocol that you may have but that's about it. If you don't have a specific question or query then I'll close this issue tomorrow.
— Reply to this email directly, view it on GitHub https://github.com/jborean93/smbprotocol/issues/255#issuecomment-1840249216, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAX7BVZVS5VOSBM5ROFU6I3YH3LERAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBQGI2DSMRRGY . You are receiving this because you authored the thread.Message ID: @.***>
Yep, the search_pattern
kwarg https://github.com/jborean93/smbprotocol/blob/37512ee0648ad64f98755833382fea790d9b2df6/src/smbclient/_os.py#L526 supports the normal server side filtering with *
and ?
that the underlying SMB server supports.
Awesome! thanks! I am quite proud that I actually managed to get my first version using the smbprotocol to work well enough for my purposes. In the future I'll rely on smbclient for sure!
One last question, you might easily be able to answer for me. Is there a record of the username or owner who uploaded/created the file in the samba protocol?
Thanks again!
On Tue, 5 Dec 2023 at 19:41, Jordan Borean @.***> wrote:
Yep, the search_pattern kwarg https://github.com/jborean93/smbprotocol/blob/37512ee0648ad64f98755833382fea790d9b2df6/src/smbclient/_os.py#L526 supports the normal server side filtering with * and ? that the underlying SMB server supports.
— Reply to this email directly, view it on GitHub https://github.com/jborean93/smbprotocol/issues/255#issuecomment-1841401433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAX7BV6GYKYBJVMQYIIH2ADYH5TELAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGQYDCNBTGM . You are receiving this because you authored the thread.Message ID: @.***>
-- Bjorn Heijligers +31620106733
The closest there is is the "Owner" of the file in the security descriptor. Unfortunately it's not reliable as on Windows this could be the Administrators
group or whatever is set in the user's group sids as the owner. Plus getting that value will only give you the SID string in python, you still need a separate process to translate that to an account name which this library does not do.
Thanks! SID might actually be enough. I'm only interested in knowing which files were created by the same users, not necessarily the name of the user.
On Thu, 7 Dec 2023 at 22:56, Jordan Borean @.***> wrote:
The closest there is is the "Owner" of the file in the security descriptor. Unfortunately it's not reliable as on Windows this could be the Administrators group or whatever is set in the user's group sids as the owner. Plus getting that value will only give you the SID string in python, you still need a separate process to translate that to an account name which this library does not do.
— Reply to this email directly, view it on GitHub https://github.com/jborean93/smbprotocol/issues/255#issuecomment-1846170683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAX7BV44CC5J2GK7CQHZVNLYII3PPAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWGE3TANRYGM . You are receiving this because you authored the thread.Message ID: @.***>
-- Bjorn Heijligers +31620106733
I'm trying to use GPT4 to implement a python smb crawler that has to connect over a VERY SLOW connection with a Synology NAS with MILLIONS of files. Luckily I only need a subset of the folder and of the file types. Can someone help me get a basic version up and running. Both using various GPT tools and trying to parse the low level source code myself i haven't managed to get the following software design and reference implementation to work:
Prototype 3:
smbprotocol
PyYAML
loguru
single-threaded
tenacity
(for retry logic)Yaml.conf: `top_folder_filter: P100* file_copy_extention_filter:
Intended Pseudocode
- Initialize:Main Process:
Recursive Folder Crawl (folder):
Error Handling:
Finalize:
Configure logging
logging.basicConfig(level=logging.INFO) import uuid
def main():
if name == "main": main() '
attempt 2 (incomplete) ' import yaml from loguru import logger from tenacity import retry, stop_after_attempt, wait_exponential from smbprotocol.open import CreateDisposition, CreateOptions, DirectoryAccessMask, FileAttributes, \ FileInformationClass, ImpersonationLevel, Open, ShareAccess from contextlib import contextmanager from io import BytesIO from smbprotocol.connection import Connection from smbprotocol.session import Session from smbprotocol.open import CreateDisposition, FileAttributes, FilePipePrinterAccessMask, ImpersonationLevel, Open, \ ShareAccess from smbprotocol.tree import TreeConnect from smbprotocol.connection import Connection from smbprotocol.session import Session from smbprotocol.tree import TreeConnect from smbprotocol.connection import Connection from smbprotocol.session import Session from smbprotocol.open import CreateDisposition, CreateOptions, DirectoryAccessMask, FileAttributes, \ FileInformationClass, ImpersonationLevel, Open, ShareAccess from smbprotocol.tree import TreeConnect import uuid,sys
def smb_b_open(tree, mode='r', share='r', username=None, password=None, encrypt=True): """ Functions similar to the builtin open() method where it will create an open handle to a file over SMB. This can be used to read and/or write data to the file using the methods exposed by the Open() class in smbprotocol. Read and write operations only support bytes and not text strings.
class FileEntry(object):
Define _listdir helper function for applying a filter pattern and recursion to listing the content of a samba share,
specified by the tree variable
def _listdir(tree, path, pattern, recurse): full_path = tree.share_name if path != "": full_path += r"\%s" % path
def main1():
Load configuration
if name == "main": main1()
'
Software Design Specification for a Remote Samba Share Crawler
Overview
The Remote Samba Share Crawler is designed to connect to a Samba share, crawl through its directories and files, and download specified files to a local directory. It supports various features like recursive crawling, threading, logging, and error handling.
Functional Requirements
Non-functional Requirements
Proposed Architecture
1. Classes and Modules
Crawler
: Main class handling connection, crawling, downloading, and state management.FileEntry
: Class representing a file or directory in the Samba share.yaml
orjson
).logging
module or an alternative).2. External Libraries
smbprotocol
,pysmb
, or an equivalent).PyYAML
orjson
).logging
module or an equivalent likeloguru
).3. Configuration
4. Logging
5. Error Handling and Retry Logic
6. Threading and Concurrency