noirello / bonsai

Simple Python 3 module for LDAP, using libldap2 and winldap C libraries.
MIT License
117 stars 33 forks source link

paged_search is limited to 1000 results #31

Closed jetersen closed 4 years ago

jetersen commented 5 years ago

Even with auto next page, it seems to be limited at 1000, acquire_next_page even returns None.

Python on Windows 10 running against an AD search is expected to return 1020 results 😓

noirello commented 5 years ago

Thank you for reporting. It's never been tested with that amount of records. I'll look into it.

jetersen commented 5 years ago

I assume this is related to how I ended up doing it in C#: https://stackoverflow.com/questions/55208799/page-ldap-query-against-ad-in-net-core-using-novell-ldap

noirello commented 5 years ago

I've looked into it and looks like to me that it's more like a server-side configuration, than a client-side bug.

There's a server configuration setting for maximal entries in a result set, that cannot be bypassed even with paged search. I found this serverfault answer about the topic.

I tested with OpenLDAP that even I inserted 2000 entries under an OU, I was only able to get 500 entries (both simple search and paged_search), which is the default server side sizelimit for OpenLDAP (while AD's default seems like to be 1000). By increasing the server's limit, I ended up with more entries. It should be mentioned that this limit can be controlled on a user level as well. I can query the whole resultset with my dedicated admin user by default.

As your C# example shows, you can use virtual_list_search method to query from a specific point of the entire set, but that'll be also restricted to the server's sizelimit.

Where the module fails, is notifying the users that they received only a partial result. I'll see to it, that the search raises SizeLimitError in these cases.

reach4bawer commented 5 years ago

The ldap3 module seems to return all the entries while the same search result query in bonsai was limited to 1500 results for me. The output expected was 53,000. I really like the work that you have done here and the bonsai module seems so much faster than ldap3 module. Granted that I didn't use the async programming. I tried the following code maybe you can help me how can I replicate the same results. Bonsai code -

client.set_auto_page_acquire(True)
async with client.connect(is_async=True) as conn:
  res = await conn.search('CN=Test,OU=a,OU=b,DC=c,DC=d,DC=com',0, "(objectclass=group)", attrlist=['member'], sizelimit=0)
  return res

In the output I get the following -

'dn': , 'member': [], 'member;range=0-1499':[list with 1500 member]

While my ldap3 query looked like the following -

c = Connection(server, username, password, authentication='SIMPLE', auto_bind=True)

c.search(search_base = 'DC=a,DC=b,DC=com',
         search_filter = '(&(objectClass=group)(samaccountname=Test))',
         search_scope = SUBTREE,
         attributes = ['member'], size_limit=0)

for entry in c.response:
    for member in entry['attributes']['member']:
    print(member)

This gives me all the members in the output without any paged results while the bonsai code returns empty member list and a paged member list.

I have tried the paged_results as well the virtual_list_search but I get the same results. Maybe I am doing something wrong.

noirello commented 5 years ago

I'm not sure I follow you, your two code snippets quite different. With bonsai you perform a based scoped (scope=0) search at CN=Test,OU=a,OU=b,DC=c,DC=d,DC=com, while with ldap3 you do a subtree search from DC=dmn1,DC=fmr,DC=com.

Seems like you hit a different limit of Active Directory, which limits the maximal number of values of an attribute that belongs to an entry. See here. To overcome this obstacle, ldap3 has a nice feature called auto_range (which if I understand it right, starts multiple queries in the background to get all the values).

But my experience shows that ldap3 is also constrained by the original issue, limited entries of a search request.

reach4bawer commented 5 years ago

Thanks for the clarification.Is there a way to a way to query the next set of results using bonsai? I tired the auto next page but to no avail.

noirello commented 5 years ago

You should open a new issue for this and I'll look into it.

noirello commented 5 years ago

It looks like I was mistaken about Active Directory. It doesn't have a sizelimit options (like OpenLDAP), only have a MaxPageSize option, which seems to effect only simple search, but not paged_search. Not even its page_size parameter (which is kind of weird to be honest). I was able to query 2000 entries with a larger page_size, than I had set to the MaxPageSize limit.

So there were two bugs that needed to be fixed:

The latter was probably due to an incorrect condition check. Both of them was fixed on the dev branch. It would be great if you'll able to test the latest state of the dev branch with your server configuration.

reach4bawer commented 5 years ago

I tried the dev branch but building process gave a lot of warnings for things that have been deprecated. Using the version - bonsai==1.2.0, I wrote a simple script to query AD group that has 5000+ members -

The script is as follows -

import asyncio
import bonsai

domain_name = 'domain'
username = 'user'
password = 'password'

bonsai.set_connect_async(False)
server_name = 'ldaps://url.com'
CACERT_FILE = "//location on mac//Certificate_files//cacerts.pem"
client = bonsai.LDAPClient(server_name)
client.set_ca_cert(CACERT_FILE)
client.set_credentials("SIMPLE", user="CN="+username+", OU=ou, DC=domain,DC=name,DC=com", password=password)
client.set_auto_page_acquire(False)
base = "DC=domain,DC=name,DC=com"
compiled_search_filter='(&(objectCategory=person)(objectClass=user)(memberOf:1.2.840.113556.1.4.1941:=CN=group with 50k members,OU=ou, DC=domain,DC=name,DC=com))'

final_result = []
async with client.connect(is_async=True) as conn:
    result = await conn.paged_search(base=base, scope=2, filter_exp=compiled_search_filter, attrlist=['employeeID'],page_size=1000)
    for r in result:
        final_result.append(r)
    print(len(final_result))
    while result.acquire_next_page():
        print(len(final_result))
        for r in result:
            final_result.append(r)

I got the same error that was reported by someone here

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code last_expr = (await self._async_exec(code_obj, self.user_ns)) File "", line 5, in async-def-wrapper password = 'password' File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, kwargs) File "/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/init.py", line 3, in from .ldapconnection import LDAPConnection File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, *kwargs) File "/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/ldapconnection.py", line 5, in from ._bonsai import ldapconnection, ldapsearchiter File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, args, kwargs) ImportError: dlopen(/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so, 2): Symbol not found: _ldap_create_passwordpolicy_control Referenced from: /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so Expected in: flat namespace in /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so

Please let me know if I am doing something wrong or if I missed something.

noirello commented 5 years ago

You need to install a newer OpenLDAP version from brew. See the full description here.

The OpenLDAP library that is shipped with Mac OS by default is too old for the module.

reach4bawer commented 5 years ago

I did the installation but no success. I still can't get the code in dev branch to work. I have openldap 2.4.48 and Python 3.7.4 I still get the following error-

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3320, in run_code last_expr = (await self._async_exec(code_obj, self.user_ns)) File "", line 5, in async-def-wrapper password = 'password' File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, kwargs) File "/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/init.py", line 3, in from .ldapconnection import LDAPConnection File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, *kwargs) File "/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/ldapconnection.py", line 5, in from ._bonsai import ldapconnection, ldapsearchiter File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, args, kwargs) ImportError: dlopen(/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so, 2): Symbol not found: _ldap_create_passwordpolicy_control Referenced from: /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so Expected in: flat namespace in /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so

Updating the comment with additional details.

noirello commented 5 years ago

I think you can use otool -L /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so on Mac to find out which library are used.

Also I put the fixed wheel file for Mac OS X here temporary. It was build and tested on Travis CI.

reach4bawer commented 5 years ago

On running the otool -L /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so

I got the following result-

otool -L /usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so
/usr/local/lib/python3.7/site-packages/bonsai-1.2.0-py3.7-macosx-10.14-x86_64.egg/bonsai/_bonsai.cpython-37m-darwin.so:
    /System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)

I updated bonsai with the wheel file that you gave, it seems to work. I tried the code but I am unable to get more than 1000 entries.

noirello commented 5 years ago

The otool shows that the module uses the shipped libldap instead of the installed one with brew (it should point somewhere to /usr/local/Cellar/openldap/2.4.48/lib/liblber-2.4.2.10.11.dylib) The setup.cfg's following section should instruct the llvm to use the newer lib (might need to clean the build dir before build):

[build_ext]
include_dirs=/usr/local/opt/openldap/include
library_dirs=/usr/local/opt/openldap/lib

Also, if you want to use paged_search without auto acquire then the code becomes a little bit cumbersome (especially with async):

final_result = []
async with client.connect(is_async=True) as conn:
    result = await conn.paged_search(base=base, scope=2, filter_exp=compiled_search_filter, attrlist=['employeeID'],page_size=1000)
    for r in result:
        final_result.append(r)
    print(len(final_result))
    msgid = result.acquire_next_page()
    while msgid is not None:
        result = await conn._evaluate(msgid)
        for r in result:
            final_result.append(r)
        msgid = result.acquire_next_page()
print(len(final_result))

The acquire_next_page actually returns a message id, that needs to be polled with AIOLDAPConnection's _evaluate method. You can always use the default client.set_auto_page_acquire(True) and async for on the paged_search's result.

(But your example just showed that AIOLDAPConnection doesn't have the necessary public method to process a message id just the "protected" _evaluate. That should be also fixed.)

reach4bawer commented 5 years ago

I added these exact line before building using the command python setup.py build-

[build_ext]
include_dirs=/usr/local/opt/openldap/include
library_dirs=/usr/local/opt/openldap/lib

and installed it using python setup.py install but the code still didn't work. The wheel file seems to work. I am not sure what might be the difference. Do I need to re-clone the repo?

I just tried your version of code and that seem to work both with the client.set_auto_page_acquire(True) and with client.set_auto_page_acquire(False). It seems to return the desired results. Do you think there are any changes needed for the code to run on Linux?

noirello commented 5 years ago

Great, thank you for your confirmation. I don't think any changes are needed on Linux for the code.

Unfortunately, I'm out of ideas for the build. There is no change on the dev branch, no need to re-clone it.