yogeshojha / rengine

reNgine is an automated reconnaissance framework for web applications with a focus on highly configurable streamlined recon process via Engines, recon data correlation and organization, continuous monitoring, backed by a database, and simple yet intuitive User Interface. reNgine makes it easy for penetration testers to gather reconnaissance with minimal configuration and with the help of reNgine's correlation, it just makes recon effortless.
https://yogeshojha.github.io/rengine/
GNU General Public License v3.0
7.41k stars 1.13k forks source link

bug: Scans pending caused by broken celery #1241

Closed metehan-arslan closed 4 months ago

metehan-arslan commented 5 months ago

Is there an existing issue for this?

Current Behavior

Rengine scans stuck at pending, whois doesn't works. make logs shows celery loop.

Thanks to Talanor from discord we were able to identify the issue. Running pip as root causing to crash existing system dependencies.

Removing the following line fixed the loop: https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81

Expected Behavior

Scans to work, loops in celery shouldn't happen.

celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.

Steps To Reproduce

git clone https://github.com/yogeshojha/rengine.git sudo ./install

Environment

- reNgine: 2.0.5
- OS: Raspberry Pi OS (bookworm), Fedora 40
- Python: 3.11.2
- Docker Engine: 26.1.1
- Docker Compose: 2.27.0
- Browser: Firefox, Chrome, Ungoogled Chromium

Anything else?

see also: https://github.com/yogeshojha/rengine/issues/1234

github-actions[bot] commented 5 months ago

👋 Hi @metehan-arslan, Issues is only for reporting a bug/feature request. Please read documentation before raising an issue https://rengine.wiki For very limited support, questions, and discussions, please join reNgine Discord channel: https://discord.gg/azv6fzhNCE Please include all the requested and relevant information when opening a bug report. Improper reports will be closed without any response.

Talanor commented 5 months ago

To add a bit more context here: A modification from the default Dockerfile was made: arch is arm64 (for OS and go)

Probably one of this lib install breaks celery : https://github.com/laramies/theHarvester/blob/master/requirements/base.txt

Generally:

shubhamvashist11 commented 5 months ago
_Removing the following line fixed the loop:
https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81_

It didn't fix this for me. Any other workaround? @Talanor @metehan-arslan

Talanor commented 5 months ago
_Removing the following line fixed the loop:
https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81_

It didn't fix this for me. Any other workaround? @Talanor @metehan-arslan

If you have initialized your container once, edit your ./web/celery-entrypoint.sh to keep only the celery workers launch lines at the end, which look something like:

echo "Starting Workers..."
echo "Starting Main Scan Worker with Concurrency: $MAX_CONCURRENCY,$MIN_CONCURRENCY"
watchmedo auto-restart --recursive --pattern="*.py" --directory="/usr/src/app/reNgine/" -- celery -A reNgine.tasks worker --loglevel=info --autoscale=$MAX_CONCURRENCY,$MIN_CONCURRENCY -Q main_scan_queue &
[...]
watchmedo auto-restart --recursive --pattern="*.py" --directory="/usr/src/app/reNgine/" -- celery -A reNgine.tasks worker --pool=gevent --concurrency=10 --loglevel=info -Q theHarvester_queue -n theHarvester_worker
exec "$@"

Then docker compose down and docker compose up the celery container. If the issue persits, it is due to something else.

If it works, some pip install breaks celery. Add back lines you deleted slowly and down/up the celery container until it breaks to find the culprit.

I'm working on a container with venvs & pipx, but in the meantime that'll get you running.

OffS3c commented 4 months ago

https://github.com/yogeshojha/rengine/issues/1248 This issue seems to be related. I was having the same output.

Talanor commented 4 months ago

1248 This issue seems to be related. I was having the same output.

Unlikely, Infoga isn't cloned since it doesn't exist, so it can't make celery fail. More likely, infoga wasn't cloned, hence the error, AND you had a broken install due to this issue.

yogeshojha commented 4 months ago

Hi, this looks very familiar to me. Which branch did you clone, is it master or release/2.1.0?

I had this exact issue when I tried to use ollama with celery and it has known issues.

But on master this is very strange.

Talanor commented 4 months ago

This is on master, the discord is full of people with clean install having that bug.

Nandolorian commented 4 months ago

If you have initialized your container once, edit your ./web/celery-entrypoint.sh to keep only the celery workers launch lines at the end, which look something like:

I have the same behaviour in a fresh install of Ubuntu Server 240.4, I follow your advise and comment the lines in ./web/celery-entrypoint.sh. I found that the error was generated by theHarvester in this line:

python3 -m pip install -r /usr/src/github/theHarvester/requirements/base.txt

In the file base.txt the library fastapi==0.111.0 is the culprit. I hope that this helps

yogeshojha commented 4 months ago

@Nandolorian did you downgrade or upgrade the fastapi version? When I was testing 2.1.0 I found out that asyncio was the culprit. Not sure why fastapi has issues with celery

Nandolorian commented 4 months ago

@yogeshojha I downgraded to 0.110.3 and the error didn't happen. I run some scans using the OSINT scan engine and theHarvester runs without any trouble.

yogeshojha commented 4 months ago

@Nandolorian Thank you! I am downgrading fastapi and let me try doing the installation!

yogeshojha commented 4 months ago

@Nandolorian I tired with downgraded fastapi, sadly it doesnt work. Do you mind sharing with me all the requirements version?

You can do this by

docker exec -it rengine-celery-1 bash

and then

pip freeze

yogeshojha commented 4 months ago

Okay I think httpcore is the culprit here.

https://github.com/python-trio/trio/issues/2848

I had this exact same issue when using ollama-python because it uses httpcore library and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

I guess finding which tool uses httpcore and removing them would solve this.

Talanor commented 4 months ago

Or, installing tools in venvs ;)

yogeshojha commented 4 months ago

@Talanor yeah venv would be better, but either ways when any of the tools reNgine uses httpcore it wont be able to work with gevent and celery. We might have to change the way these tools run outside celery or use another event pool. But I am open to hearing how you think venv will help us solve this?

Nandolorian commented 4 months ago

@Nandolorian I tired with downgraded fastapi, sadly it doesnt work. Do you mind sharing with me all the requirements version?

You can do this by

docker exec -it rengine-celery-1 bash

and then

pip freeze

I see you have discovered that httpcore is the problem. Anyway, I am posting the list for your reference in case it is still useful.

Requirements aiofiles==23.2.1
aiodns==3.2.0
aiohttp==3.9.5
aiomultiprocess==0.9.1
aiosignal==1.3.1
aiosqlite==0.20.0
amqp==5.2.0
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
argcomplete==3.3.0
argh==0.26.2
asgiref==3.8.1
async-timeout==4.0.3
attrs==23.2.0
backoff==2.2.1
beautifulsoup4==4.12.3
billiard==4.2.0
blinker==1.4
Brotli==1.1.0
bs4==0.0.1
celery==5.4.0
censys==2.2.12
certifi==2024.2.2
cffi==1.16.0
chardet==5.0.0
charset-normalizer==2.1.1
click==8.1.7
click-didyoumean==0.3.1
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.4
coreapi==2.3.3
coreschema==0.0.4
cron-descriptor==1.4.3
cryptography==3.4.8
cssselect2==0.7.0
dbus-python==1.2.18
decorator==5.1.1
Deprecated==1.2.14
discord-webhook==1.3.0
distro==1.7.0
Django==3.2.4
django-ace==1.0.11
django-celery-beat==2.6.0
django-login-required-middleware==0.6.1
django-mathfilters==1.0.0
django-role-permissions==3.2.0
django-timezone-field==6.1.0
djangorestframework==3.12.4
djangorestframework-datatables==0.6.0
dnspython==2.6.1
dotted-dict==1.1.3
drf-yasg==1.21.3
et-xmlfile==1.1.0
exceptiongroup==1.2.1
exrex==0.10.5
fastapi==0.110.3
filelock==3.14.0
fire==0.4.0
fonttools==4.51.0
frozenlist==1.4.1
future==0.18.2
fuzzywuzzy==0.18.0
gevent==24.2.1
greenlet==3.0.3
gunicorn==22.0.0
h11==0.14.0
h8mail==2.5.6
html5lib==1.1
httplib2==0.20.2
humanize==4.3.0
idna==3.3
importlib-metadata==4.6.4
importlib_resources==6.4.0
inflection==0.5.1
itypes==1.2.0
jeepney==0.7.1
Jinja2==3.1.4
keyring==23.5.0
kombu==5.3.7
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
Levenshtein==0.25.1
limits==3.11.0
loguru==0.6.0
lxml==5.2.1
Markdown==3.3.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
metafinder==1.2
more-itertools==8.10.0
multidict==6.0.5
netaddr==1.2.1
netlas==0.4.1
oauthlib==3.2.0
openai==0.28.0
openpyxl==3.1.2
orjson==3.9.0
outcome==1.3.0.post0
packaging==24.0
pikepdf==8.15.1
pillow==10.3.0
playwright==1.43.0
pluginbase==1.0.1
prettytable==3.10.0
prompt-toolkit==3.0.43
psycopg2==2.9.7
pycares==4.4.0
pycparser==2.22
pycvesearch==1.0
pydantic==2.7.1
pydantic_core==2.18.2
pydyf==0.10.0
pyee==11.1.0
Pygments==2.18.0
PyGObject==3.42.1
PyJWT==2.3.0
pyparsing==2.4.7
pyphen==0.15.0
PySocks==1.7.1
python-apt==2.4.0+ubuntu3
python-crontab==3.0.0
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-Levenshtein==0.25.1
python-pptx==0.6.23
pytz==2024.1
PyVirtualDisplay==3.0
PyYAML==6.0.1
rapidfuzz==3.9.0
redis==5.0.3
requests==2.31.0
requests-file==2.0.0
retrying==1.3.4
rich==13.7.1
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scapy==2.4.3
SecretStorage==3.3.1
selenium==4.9.1
shodan==1.31.0
simplejson==3.19.2
six==1.16.0
slowapi==0.1.9
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.3.2
SQLAlchemy==1.3.22
sqlparse==0.5.0
starlette==0.37.2
tenacity==8.0.1
termcolor==1.1.0
tinycss2==1.3.0
tinydb==4.8.0
tldextract==3.5.0
tqdm==4.64.0
treelib==1.6.1
trio==0.25.0
trio-websocket==0.11.1
typing_extensions==4.11.0
tzdata==2024.1
ujson==5.9.0
uritemplate==4.1.1
urllib3==1.26.9
uro==1.0.0
uvicorn==0.29.0
uvloop==0.19.0
validators==0.18.2
vine==5.1.0
wadllib==1.3.6
wafw00f==2.2.0
watchdog==4.0.0
wcwidth==0.2.13
weasyprint==53.3
webencodings==0.5.1
whatportis==0.8
win32-setctime==1.1.0
wrapt==1.16.0
wsproto==1.2.0
XlsxWriter==3.2.0
xmltodict==0.13.0
yarl==1.9.4
zipp==1.0.0
zope.event==5.0
zope.interface==6.3
zopfli==0.2.3
Talanor commented 4 months ago

@Talanor yeah venv would be better, but either ways when any of the tools reNgine uses httpcore it wont be able to work with gevent and celery. We might have to change the way these tools run outside celery or use another event pool. But I am open to hearing how you think venv will help us solve this?

Please see my PR #1250 that adresses the issue while staying on the current versions. The concept is that each tool is in its own virtual environment, so you can have multiple httpcore (or whatever else) versions installed without conflicts

yogeshojha commented 4 months ago

@Talanor your PR looks great, I liked the usage of poetry.

The problem is not conflicting versions of httcore or having multiple versions in same environment

and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

Talanor commented 4 months ago

@Talanor your PR looks great, I liked the usage of poetry.

The problem is not conflicting versions of httcore or having multiple versions in same environment

and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

I must be missing something. I don't see httpcore in their pip freeze list?

@Nandolorian can you confirm newest master works on a fresh install for you?

Talanor commented 4 months ago

Upon further investigation: The celery workers from my PR do not install httpcore (as seen in my venv):

talanor@pentest:~/containers/reNgine-CaRE$ docker run --entrypoint /bin/bash -it talanor/rengine-celery:v0.3 
rengine@909fc90322cc:~/rengine$ ls
rengine@909fc90322cc:~/rengine$ cd
rengine@909fc90322cc:~$ ls
nuclei-templates  poetry.lock  pyproject.toml  rengine  results  scan_results  tools  wordlists
rengine@909fc90322cc:~$ poetry -C . shell
Spawning shell within /home/rengine/.cache/pypoetry/virtualenvs/celery-rengine-HmEJnPQT-py3.10
rengine@909fc90322cc:~$ . /home/rengine/.cache/pypoetry/virtualenvs/celery-rengine-HmEJnPQT-py3.10/bin/activate
(celery-rengine-py3.10) rengine@909fc90322cc:~$ pip list
Package                          Version
-------------------------------- -----------
aiodns                           3.0.0
aiohttp                          3.9.5
aiosignal                        1.3.1
amqp                             5.2.0
appdirs                          1.4.4
argh                             0.26.2
asgiref                          3.8.1
async-timeout                    4.0.3
attrs                            23.2.0
beautifulsoup4                   4.9.3
billiard                         4.2.0
Brotli                           1.1.0
celery                           5.4.0
certifi                          2024.2.2
cffi                             1.16.0
charset-normalizer               3.3.2
click                            8.1.7
click-didyoumean                 0.3.1
click-plugins                    1.1.1
click-repl                       0.3.0
coreapi                          2.3.3
coreschema                       0.0.4
cron-descriptor                  1.4.3
cssselect2                       0.7.0
decorator                        5.1.1
Deprecated                       1.2.14
discord-webhook                  1.3.0
Django                           3.2.4
django-ace                       1.0.11
django-celery-beat               2.6.0
django-login-required-middleware 0.6.1
django-mathfilters               1.0.0
django-role-permissions          3.2.0
django-timezone-field            6.1.0
djangorestframework              3.12.4
djangorestframework-datatables   0.6.0
dotted-dict                      1.1.3
drf-yasg                         1.21.3
et-xmlfile                       1.1.0
filelock                         3.14.0
fonttools                        4.51.0
frozenlist                       1.4.1
gevent                           24.2.1
greenlet                         3.0.3
gunicorn                         22.0.0
html5lib                         1.1
humanize                         4.3.0
idna                             3.7
inflection                       0.5.1
itypes                           1.2.0
Jinja2                           3.1.4
kombu                            5.3.7
lxml                             5.2.1
Markdown                         3.3.4
MarkupSafe                       2.1.5
metafinder                       1.2
multidict                        6.0.5
netaddr                          0.8.0
netlas                           0.4.1
openai                           0.28.0
openpyxl                         3.1.2
orjson                           3.9.0
packaging                        24.0
pikepdf                          8.15.1
pillow                           10.3.0
pip                              24.0
prettytable                      2.1.0
prompt-toolkit                   3.0.43
psycopg2                         2.9.7
pycares                          4.4.0
pycparser                        2.22
pycvesearch                      1.0
pydyf                            0.10.0
Pygments                         2.18.0
pyphen                           0.15.0
PySocks                          1.7.1
python-crontab                   3.0.0
python-dateutil                  2.9.0.post0
python-docx                      1.1.2
python-pptx                      0.6.23
pytz                             2024.1
PyYAML                           6.0.1
redis                            5.0.3
requests                         2.31.0
requests-file                    2.0.0
ruamel.yaml                      0.18.6
ruamel.yaml.clib                 0.2.8
scapy                            2.4.3
setuptools                       69.5.1
simplejson                       3.17.2
six                              1.16.0
soupsieve                        2.5
sqlparse                         0.5.0
tinycss2                         1.3.0
tinydb                           4.4.0
tldextract                       3.5.0
tqdm                             4.66.4
typing_extensions                4.11.0
tzdata                           2024.1
uritemplate                      4.1.1
urllib3                          2.2.1
uro                              1.0.0
validators                       0.18.2
vine                             5.1.0
watchdog                         4.0.0
wcwidth                          0.2.13
weasyprint                       53.3
webencodings                     0.5.1
whatportis                       0.8.2
wrapt                            1.16.0
XlsxWriter                       3.2.0
xmltodict                        0.13.0
yarl                             1.9.4
zope.event                       5.0
zope.interface                   6.3
zopfli                           0.2.3

However, it is installed as a fastapi dependency from theHarvester. Installing theHarvester via venv do not install httpcore in the celery environment, and does not introduce conflict.

Basically, if you can fix it by hot removing a package via pip, its something that can (should) be solved with venvs.