rseichter / fangfrisch

Update and verify unofficial Clam Anti-Virus signatures
GNU General Public License v3.0
86 stars 10 forks source link

error in refreshing the rules #17

Closed mlodic closed 1 year ago

mlodic commented 1 year ago

Hello!

first of all, thank you for this work!

While trying to integrate this tool with IntelOwl, I found a problem that always happen when I launch fangfrisch refresh

intelowl_malware_tools_analyzers | INFO: /var/lib/clamav/junk.ndb updated (6989248 bytes)
intelowl_malware_tools_analyzers | INFO: /var/lib/clamav/jurlbl.ndb updated (830565 bytes)
intelowl_malware_tools_analyzers | INFO: /var/lib/clamav/jurlbla.ndb updated (88784 bytes)
intelowl_malware_tools_analyzers | Traceback (most recent call last):
intelowl_malware_tools_analyzers |   File "/usr/local/bin/fangfrisch", line 8, in <module>
intelowl_malware_tools_analyzers |     sys.exit(main())
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/fangfrisch/__main__.py", line 64, in main
intelowl_malware_tools_analyzers |     ClamavRefresh(args).refresh_all()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/fangfrisch/refresh.py", line 142, in refresh_all
intelowl_malware_tools_analyzers |     if self.refresh(ci):
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/fangfrisch/refresh.py", line 134, in refresh
intelowl_malware_tools_analyzers |     RefreshLog.update(ci, digest.data)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/fangfrisch/db.py", line 190, in update
intelowl_malware_tools_analyzers |     entry: RefreshLog = _query_url(ci.url, session)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/fangfrisch/db.py", line 239, in _query_url
intelowl_malware_tools_analyzers |     return session.query(RefreshLog).filter(RefreshLog.url == url).first()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2752, in first
intelowl_malware_tools_analyzers |     return self.limit(1)._iter().first()  # type: ignore
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2855, in _iter
intelowl_malware_tools_analyzers |     result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2229, in execute
intelowl_malware_tools_analyzers |     return self._execute_internal(
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2114, in _execute_internal
intelowl_malware_tools_analyzers |     conn = self._connection_for_bind(bind)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1981, in _connection_for_bind
intelowl_malware_tools_analyzers |     return trans._connection_for_bind(engine, execution_options)
intelowl_malware_tools_analyzers |   File "<string>", line 2, in _connection_for_bind
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
intelowl_malware_tools_analyzers |     ret_value = fn(self, *arg, **kw)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1108, in _connection_for_bind
intelowl_malware_tools_analyzers |     conn = bind.connect()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3245, in connect
intelowl_malware_tools_analyzers |     return self._connection_cls(self)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 145, in __init__
intelowl_malware_tools_analyzers |     self._dbapi_connection = engine.raw_connection()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3269, in raw_connection
intelowl_malware_tools_analyzers |     return self.pool.connect()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 455, in connect
intelowl_malware_tools_analyzers |     return _ConnectionFairy._checkout(self)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 1270, in _checkout
intelowl_malware_tools_analyzers |     fairy = _ConnectionRecord.checkout(pool)
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 719, in checkout
intelowl_malware_tools_analyzers |     rec = pool._do_get()
intelowl_malware_tools_analyzers |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 157, in _do_get
intelowl_malware_tools_analyzers |     raise exc.TimeoutError(
intelowl_malware_tools_analyzers | sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r)

At one point, during download, SQLAlchemy breaks. I think that by incrementing the pool_size and max_overflow values (see doc) it would fix the problem.

I can open a really little PR for this if you like

rseichter commented 1 year ago

Thanks for the report. I will need more information to investigate this, though. You did not mention your platform, Python module versions used, and how to reproduce the issue, so I cannot do much at this point. 😉

mlodic commented 1 year ago

18 solves the issue if you like to merge it. As I mentioned, it is just a matter of SQLAlchemy configuration

rseichter commented 1 year ago

Did you read my previous comment? You did not provide enough information. I am not going to make any changes to Fangfrisch without even being able to reproduce an issue that has only ever been reported by you and nobody else. That should not come as a surprise. It could be an SQL driver issue, an SQLalchemy problem, a problem of your computer, or whatever.

mlodic commented 1 year ago

I am just trying to help the community that is using this tool but it does not seem welcome. There is no need to be rude. Now I get why you are the single developer in this project. 🤷🏻‍♂️

Anyway, this is my setup:

I am installing fangfrisch inside an image run with this Dockerfile: https://github.com/intelowlproject/IntelOwl/blob/master/integrations/malware_tools_analyzers/Dockerfile there is python 3.8-slim image, Debian bullseye.

Then I have already told you how to reproduce the issue, which is just by running fangfrisch refresh. The configuration I used is this: https://github.com/intelowlproject/IntelOwl/blob/develop/integrations/malware_tools_analyzers/clamav/fangfrisch.conf

Then, If you don't want to merge the PR there's no problem, no hard feelings. Just the next person with the same problem will probably ignore this tool instead of trying to solve it like I just did

rseichter commented 1 year ago

Nothing about me asking for additional information was rude, you simply chose not to answer me and I had to ask once again. Before accusing others of being rude, you may want to revisit how one submits bug reports.

Also, as I mentioned, nobody but you has reported this issue so far, and I have not been able to reproduce it myself. Hence, I told you that I won't make changes. Fangfrisch has been working fine for me and others since 2020, so of course I am cautious and don't add code modifications willy-nilly to deal with a symptom I have not yet seen occur, without first trying to find the root cause for a possible, as of now unconfirmed, problem first. That's just common sense.

gchamon commented 1 year ago

Happening to me too, using docker (image python:latest) and the configuration provided by @mlodic

I just ran a container using the base image and followed the installation instructions on the site.

image

@rseichter maybe we could reopen the issue?

gchamon commented 1 year ago

I can confirm that applying patch https://github.com/rseichter/fangfrisch/pull/18 fixes the issue. I will try to investigate the issue more. Maybe there is some sort of runtime configuration that also remediates this without having to meddle with code.

gchamon commented 1 year ago

no luck trying to work around the issue.

@rseichter a safer approach to this problem would be to expose pool size and max overflow as configurations for fangfrisch.conf, for instance under DEFAULT.sqlalchemy_pool_size and DEFAULT.sqlalchemy_max_overflow. In case those configurations are omitted, the engine is created using default settings. This way there is no possibility of regression for any user.

rseichter commented 1 year ago

@gchamon Am I correct to assume that by „installation instructions on the site“ you mean the Fangfrisch online documentation? Also, could you please provide a Python module list, including the exact version of SQLalchemy, and attach your fangfrisch.conf here?

gchamon commented 1 year ago

Am I correct to assume that by „installation instructions on the site“ you mean the Fangfrisch online documentation?

Yes! By using a virtualenv, giving the correct permissions to the folders mentioned in the site etc... basically this: https://rseichter.github.io/fangfrisch/#_installation

Python module list, including the exact version of SQLalchemy

Generated with pip freeze:

certifi==2022.12.7
charset-normalizer==3.0.1
fangfrisch==1.5.0
greenlet==2.0.2
idna==3.4
requests==2.28.2
SQLAlchemy==2.0.3
typing_extensions==4.5.0
urllib3==1.26.14

fangfrisch.conf

https://github.com/gchamon/fangfrisch-queuepool-poc/blob/28c17e7cec5dd267e11381bc449dbc85d05023a2/fangfrisch.conf

Or using fangfrisch --conf fangfrisch.conf dumpconf:

[DEFAULT]
cleanup = automatic
enabled = false
integrity_check = sha256
log_level = INFO
log_method = console
max_size = 10MB
db_url = sqlite:////var/lib/fangfrisch/db.sqlite
local_directory = /var/lib/clamav
interval = 12h
on_update_timeout = 120

[malwarepatrol]
interval = 1d
integrity_check = disabled
product = 8
receipt = you_forgot_to_configure_receipt
prefix = https://lists.malwarepatrol.net/cgi/getfile?product=${product}&receipt=${receipt}&list=
url_clamav_basic = ${prefix}clamav_basic
filename_clamav_basic = malwarepatrol.db
enabled = no

[sanesecurity]
interval = 2h
prefix = http://ftp.swin.edu.au/sanesecurity/
!url_foxhole_all_cdb = ${prefix}foxhole_all.cdb
!url_foxhole_all_ndb = ${prefix}foxhole_all.ndb
!url_foxhole_mail = ${prefix}foxhole_mail.cdb
!url_scamnailer = ${prefix}scamnailer.ndb
!url_winnow_phish_complete = ${prefix}winnow_phish_complete.ndb
url_badmacro = ${prefix}badmacro.ndb
url_blurl = ${prefix}blurl.ndb
url_bofhland_cracked_url = ${prefix}bofhland_cracked_URL.ndb
url_bofhland_malware_attach = ${prefix}bofhland_malware_attach.hdb
url_bofhland_malware_url = ${prefix}bofhland_malware_URL.ndb
url_bofhland_phishing_url = ${prefix}bofhland_phishing_URL.ndb
url_foxhole_filename = ${prefix}foxhole_filename.cdb
url_foxhole_generic = ${prefix}foxhole_generic.cdb
url_foxhole_js_cdb = ${prefix}foxhole_js.cdb
url_foxhole_js_ndb = ${prefix}foxhole_js.ndb
url_hackingteam = ${prefix}hackingteam.hsb
url_junk = ${prefix}junk.ndb
url_jurlbl = ${prefix}jurlbl.ndb
url_jurlbla = ${prefix}jurlbla.ndb
url_lott = ${prefix}lott.ndb
url_malwareexpert_fp = ${prefix}malware.expert.fp
url_malwareexpert_hdb = ${prefix}malware.expert.hdb
url_malwareexpert_ldb = ${prefix}malware.expert.ldb
url_malwareexpert_ndb = ${prefix}malware.expert.ndb
url_malwarehash = ${prefix}malwarehash.hsb
url_phish = ${prefix}phish.ndb
url_phishtank = ${prefix}phishtank.ndb
url_porcupine = ${prefix}porcupine.ndb
url_rogue = ${prefix}rogue.hdb
url_scam = ${prefix}scam.ndb
url_shelter = ${prefix}shelter.ldb
url_spamattach = ${prefix}spamattach.hdb
url_spamimg = ${prefix}spamimg.hdb
url_spear = ${prefix}spear.ndb
url_spearl = ${prefix}spearl.ndb
url_winnow_attachments = ${prefix}winnow.attachments.hdb
url_winnow_bad_cw = ${prefix}winnow_bad_cw.hdb
url_winnow_extended_malware = ${prefix}winnow_extended_malware.hdb
url_winnow_extended_malware_links = ${prefix}winnow_extended_malware_links.ndb
url_winnow_malware = ${prefix}winnow_malware.hdb
url_winnow_malware_links = ${prefix}winnow_malware_links.ndb
url_winnow_phish_complete_url = ${prefix}winnow_phish_complete_url.ndb
url_winnow_spam_complete = ${prefix}winnow_spam_complete.ndb
enabled = yes

[securiteinfo]
customer_id = you_forgot_to_configure_customer_id
interval = 1h
max_size = 20MB
prefix = https://www.securiteinfo.com/get/signatures/${customer_id}/
!url_0hour = ${prefix}securiteinfo0hour.hdb
!url_old = ${prefix}securiteinfoold.hdb
!url_securiteinfo_mdb = ${prefix}securiteinfo.mdb
!url_spam_marketing = ${prefix}spam_marketing.ndb
url_android = ${prefix}securiteinfoandroid.hdb
url_ascii = ${prefix}securiteinfoascii.hdb
url_html = ${prefix}securiteinfohtml.hdb
url_javascript = ${prefix}javascript.ndb
url_pdf = ${prefix}securiteinfopdf.hdb
url_securiteinfo = ${prefix}securiteinfo.hdb
url_securiteinfo_ign2 = ${prefix}securiteinfo.ign2
enabled = no

[urlhaus]
interval = 10m
url_urlhaus = https://urlhaus.abuse.ch/downloads/urlhaus.ndb
enabled = yes
max_size = 2MB

I have also created a working proof-of-concept of the issue using docker (for reproducibility and isolation) which can be found here: https://github.com/gchamon/fangfrisch-queuepool-poc

Instructions in the README. The exact install instructions from the website were modified to fit the Dockerfile spec (for instance WORKDIR ... instead of cd ...).

rseichter commented 1 year ago

@gchamon Now that is information I can work with. Thank you for taking the time to collect the information and even provide an isolated Git project with which I can finally reproduce the problem. I forked the POC and will look into this issue.

gchamon commented 1 year ago

@rseichter 1.6.0 solves the issue!

EDIT: also took a look at dbconn branch and really there was more work involved other than just increasing the queue pool size. Nice job using with contexts to help with session management.

rseichter commented 1 year ago

Increasing the pool size would only have been like a mere coat of paint over an underlying flaw in handling database sessions. It took your information and additional testing on my end to realise that my previous code was too "optimistic" when it came to long-running DB connections. I hope the new way of handling sessions will be robust.

rseichter commented 1 year ago

Closing this issue as fixed with Fangfrisch release 1.6.0. Thanks again, @gchamon .

gchamon commented 1 year ago

@rseichter awesome! Nice job!