Scanner abruptly exits with "error: can't start new thread" after certain extent

jovyn commented 6 years ago

Here is an Error I keep getting for larger websites , and the scanner abruptly shuts down after scpidering the URL's to a certain extent.

_Exception in thread Thread-2467:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\site-packages\nyawc\CrawlerThread.py", line 116, in run
    self.__callback(self.__queue_item, new_requests, failed)
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 252, in __request_finish
    self.__spawn_new_requests()
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 105, in __spawn_new_requests
    if self.__spawn_new_request():
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 125, in __spawn_new_request
    self.__request_start(first_in_line)
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 217, in __request_start
    thread.start()
  File "C:\Python27\lib\threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread_

tijme commented 6 years ago

@jovyn Thank you for reporting this issue. I think this is due to a design issue in the crawler.

Currently, the crawler starts new threads in the thread that finished. I'm going to change it so that new threads will be started from the main thread.

tijme commented 6 years ago

@jovyn Do you know after how much requests this happens?

jovyn commented 6 years ago

Referring to my Burp logs (I chained the scanner via Burp) I can see about 8492 requests. This may not be the exact number everytime this happens. Maeanwhile let me run it against some other site and share the feedback with you.

tijme commented 6 years ago

@jovyn I think I fixed this issue on the develop branch, however, I can't test it since I could not reproduce the issue after ~8500 requests. Could you test if it works for you?

jovyn commented 6 years ago

Hey @tijme I installed the new version from the develop branch and ran the scanner, but I am getting the below errors for larger scanning requests (about 7000 requests) \:

_[ERROR]
error : Memory allocation failed : growing buffer
error : Memory allocation failed : growing buffer_

followed by \:

_Traceback (most recent call last):
  File ".\extended.develop.py", line 177, in <module>
    main()
  File ".\extended.develop.py", line 84, in main
    driver.start()
  File "C:\Python27\lib\site-packages\acstis\Driver.py", line 136, in start
    crawler.start_with(startpoint)
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 95, in start_with
    self.__crawler_start()
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 166, in __crawler_start
    self.__spawn_new_requests()
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 111, in __spawn_new_requests
    if self.__spawn_new_request():
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 131, in __spawn_new_request
    self.__request_start(first_in_line)
  File "C:\Python27\lib\site-packages\nyawc\Crawler.py", line 230, in __request_start
    thread.start()
  File "C:\Python27\lib\threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread_

jovyn commented 6 years ago

I did not explicitly specify the number of threads (-mt option) and I the default number of threads the scanner takes is 8. I think the scanner is trying to create too many threads and is going beyond the default value of 8.

Reference : https://johnsofteng.wordpress.com/2010/03/05/python-thread-error-cant-start-new-thread/

jovyn commented 6 years ago

Tried specifying the number of threads using the -mt option. I still get the same error.

tijme commented 6 years ago

@jovyn Did you update the dependencies? If you run pip freeze the nyawc dependency must be on version 1.7.9.

jovyn commented 6 years ago

Yes I did . The nyawc dependency is on 1.7.9

PS C:\Angular-CSTI-Scanner\Develop_Branch> pip freeze

acstis==3.0.2
alabaster==0.7.10
Babel==2.5.1
beautifulsoup4==4.6.0
certifi==2017.7.27.1
chardet==3.0.4
colorama==0.3.9
colorlog==2.10.0
docutils==0.14
idna==2.5
imagesize==0.7.1
Jinja2==2.9.6
lxml==4.0.0
MarkupSafe==1.0
nyawc==1.7.9
pockets==0.5.1
Pygments==2.2.0
pytz==2017.2
requests==2.18.1
requests-toolbelt==0.8.0
selenium==3.4.3
six==1.11.0
snowballstemmer==1.2.1
Sphinx==1.5.5
sphinx-better-theme==0.13
sphinxcontrib-napoleon==0.6.1
urllib3==1.21.1

tijme commented 6 years ago

Hi @jovyn, I tried to reproduce the issue again but unfortunately I did not succeed yet.

The thread count of the process is always 9 at my machine (8 + the main thread). Besides that the memory usage always stays at ~ 80MB (also if I'm scanning thousands of URLs).

Would it be possible for you to provide the information below?

What OS (including version) do you use?
What version of Python are you using?
What version of ACSTIS are you using?
How did you install ACSTIS?
Are you still using a proxy?
Could you sent me the exact command (including arguments) you are executing (e.g. python extended.py -d http://example.com)?
Would it be possible for you to share the URL you are scanning?
How much memory is the process using?
How many thread is the process using?

jovyn commented 6 years ago

Hey @tijme ,

Below are my responses > What OS (including version) do you use? -- Windows 10 (64-bit) What version of Python are you using? -- Python 2.7.13 What version of ACSTIS are you using? -- Version 3.0.2 How did you install ACSTIS? -- I downloaded the .zip from (https://github.com/tijme/angularjs-csti-scanner/tree/develop) and then did a pip install --upgrade --force-reinstall .\angularjs-csti-scanner-develop.zip Are you still using a proxy? -- Yes. however i tried acstis w/o the proxy settings as well. Could you sent me the exact command (including arguments) you are executing (e.g. python extended.py -d http://example.com)? -- python .\extended.py -c -d "https://www.example.com/" -tc "Burp_Cert.pem" -mt 12 also tried acstis -c -d "https://www.example.com/" -tc "Burp_Cert.pem" -mt 12 . I have also tried without the -mt option as well as tried lesser threads (8 or 9) Would it be possible for you to share the URL you are scanning? -- Sorry @tijme I wont be able to share the target URL. How much memory is the process using? -- Not sure, will get back to you on this How many thread is the process using? -- Not sure, will get back to you on this

tijme commented 6 years ago

@jovyn I was able to reproduce the issue and I found out it occurs on Windows only.

The scanner used a lot of memory per request since it cached the lxml tree of every response. When it reached the ~8000 requests it used an average of 2GB of memory. 2GB is the limit for a 32-bit application running on the 64-bit Windows 10 OS (source), which is why it crashed on your machine.

By removing the lxml tree caching you can now scan up to 60000 requests with 2GB of memory. I will continue to improve this in the future. If you want to scan more requests already you could try to install the 64-bit version of Python.

I just released this fix in version 3.0.5 of ACSTIS.

jovyn commented 6 years ago

Thanks Tijme :)

On 1 Nov 2017 4:44 a.m., "Tijme Gommers" notifications@github.com wrote:

Closed #6 https://github.com/tijme/angularjs-csti-scanner/issues/6.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tijme/angularjs-csti-scanner/issues/6#event-1319905492, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXbBCpQqhTAzn6JSis-9y_Oc3nsYf1mks5sx6ntgaJpZM4Pb6QY .

tijme / angularjs-csti-scanner

Scanner abruptly exits with "error: can't start new thread" after certain extent #6