r3nt0n / bopscrk

Generate smart and powerful wordlists
https://pypi.org/project/bopscrk
GNU General Public License v3.0
851 stars 109 forks source link

Tool not using multithreading #29

Closed vertexgamer closed 1 week ago

vertexgamer commented 7 months ago

Bopscrk seems to always only use 1 thread no matter what operation is made, even if in bopscrk.cfg 32 threads are specified. Is this normal behaviour?

RustyRaptor commented 5 months ago

TL;DR it is normal behavior if your wordlist isn't gigantic. This is because while you are specifying the number of threads in the pool, the multiprocessing library will decide in the end how many tasks will be sent to the pool. You can change this though pretty easily.

https://github.com/r3nt0n/bopscrk/blob/5fdb5bba76ccbef1fb0c62b3528d79cc1972e9d3/bopscrk/modules/transforms.py#L144

It uses the pool.map function from multiprocessing to distribute the wordlist to the process pool in chunks of a specified size. However it does not specify the chunksize so...

if chunksize is None:
            chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
            if extra:
                chunksize += 1
        if len(iterable) == 0:
            chunksize = 0

This is how it determines that. iterable is the wordlist here and I believe len(self._pool) is the number of threads you specify.

So for example say you get a wordlist of size 60 it's going to be 60//32*4 which is 0 for chunksize and 60 for extra so it will be chunksize 1 and then the map function has it's own way of generating the tasks based on the chunksize which I don't fully understand but I think it's safe to say that it could result in there being just 1 process and truth be told often times it's more efficient that way. There is overhead in creating and managing processes.

If you want to edit the sourcecode to modify this behavior you can. There's many different versions of the pool.map() function it uses which work in different ways and you can specify a chunk size and other params.

Hopefully this overview is helpful to hackers and also to the developers if they want to change the behavior of the software. I'm also in the process of learning this stuff so take with a grain of salt.

Docs: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map

Relevant Sourcecode: https://github.com/python/cpython/blob/3.12/Lib/multiprocessing/pool.py