kevoreilly / CAPEv2

Malware Configuration And Payload Extraction
https://capesandbox.com/analysis/
Other
1.76k stars 388 forks source link

sqlalchemy.exc.IntegrityError #2178

Open HUSMUS9999 opened 1 week ago

HUSMUS9999 commented 1 week ago

After installation success when i try to submit a binary for analysis this happen

Screenshot from 2024-06-21 01-27-34 Screenshot from 2024-06-21 01-28-53 IntegrityError

sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey" DETAIL: Key (tag_id)=(14) is not present in table "tags".

[SQL: INSERT INTO tasks_tags (task_id, tag_id) VALUES (%(task_id)s, %(tag_id)s)] [parameters: {'task_id': 14, 'tag_id': 14}] (Background on this error at: https://sqlalche.me/e/20/gkpj) Traceback (most recent call last)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context

self.dialect.do_execute(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute

cursor.execute(statement, parameters)

The above exception was the direct cause of the following exception:
File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/contrib/staticfiles/handlers.py", line 80, in __call__

return self.application(environ, start_response)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/wsgi.py", line 124, in __call__

response = self.get_response(request)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/base.py", line 140, in get_response

response = self._middleware_chain(request)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 57, in inner

response = response_for_exception(request, exc)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 140, in response_for_exception

response = handle_uncaught_exception(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 181, in handle_uncaught_exception

return debug.technical_500_response(request, *exc_info)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django_extensions/management/technical_response.py", line 40, in null_technical_500_response

raise exc_value.with_traceback(tb)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner

response = get_response(request)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response

response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/opt/CAPEv2/web/submission/views.py", line 417, in index

task_id = db.add_static(file_path=path, priority=priority, tlp=tlp, options=options, user_id=request.user.id or 0)

File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1605, in add_static

task_id = self.add(

File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1264, in add

with self.session.begin_nested():

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 147, in __exit__

with util.safe_reraise():

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__

raise exc_value.with_traceback(exc_tb)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 145, in __exit__

self.commit()

File "<string>", line 2, in commit
File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go

ret_value = fn(self, *arg, **kw)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1302, in commit

self._prepare_impl()

File "<string>", line 2, in _prepare_impl
File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go

ret_value = fn(self, *arg, **kw)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1277, in _prepare_impl

self.session.flush()

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4341, in flush

self._flush(objects)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4476, in _flush

with util.safe_reraise():

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__

raise exc_value.with_traceback(exc_tb)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4437, in _flush

flush_context.execute()

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute

rec.execute(self)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 591, in execute

self.dependency_processor.process_saves(uow, states)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1197, in process_saves

self._run_crud(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1260, in _run_crud

connection.execute(statement, secondary_insert)

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1418, in execute

return meth(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection

return connection._execute_clauseelement(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement

ret = self._execute_context(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context

return self._exec_single_context(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context

self._handle_dbapi_exception(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception

raise sqlalchemy_exception.with_traceback(exc_info[2]) from e

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context

self.dialect.do_execute(

File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute

cursor.execute(statement, parameters)

sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey"
DETAIL:  Key (tag_id)=(14) is not present in table "tags".

[SQL: INSERT INTO tasks_tags (task_id, tag_id) VALUES (%(task_id)s, %(tag_id)s)]
[parameters: {'task_id': 14, 'tag_id': 14}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

The debugger caught an exception in your WSGI application. You can now look at the traceback which led to the error.

To switch between the interactive traceback and the plaintext one, you can click on the "Traceback" headline. From the text traceback you can also create a paste of it. For code execution mouse-over the frame you want to debug and click on the console icon on the right side.

You can execute arbitrary Python code in the stack frames and there are some extra helpers available for introspection:

dump() shows all variables in the frame
dump(obj) dumps all that's known about the object
doomedraven commented 1 week ago

idk what happens here, but i would suggest you to check the machinery tags, if there is bad char or similar thing

leoiancu21 commented 1 week ago

I have the same problem (using Azure with scaleset pool tags), @HUSMUS9999 could you please print the content of the following tables :

leoiancu21 commented 1 week ago

Following up to the previous comment, these are my DB tables related to this issue. I'll give some context here :

my az.conf :

[Sandbox-Cape-VMSS-1]

gallery_image_name = Sandbox-Cape-Image-Definition-v8
platform = windows
arch = x64
#tags = x64
pool_tag = x64
initial_pool_size = 1
2024-06-21 09:34:32,203 [modules.machinery.az] DEBUG: Sandbox-Cape-VMSS-1_0: Initializing...
2024-06-21 09:34:42,245 [modules.machinery.az] DEBUG: Machine Sandbox-Cape-VMSS-1_0 was created and available in 104s
2024-06-21 09:34:42,475 [lib.cuckoo.core.machinery_manager] DEBUG: SFSG : available machines : [<Machine(1,'CSS-Sandbox-Cape-VMSS-1_0')>]
2024-06-21 09:34:42,476 [lib.cuckoo.core.machinery_manager] INFO: Loaded 1 machine
2024-06-21 09:34:42,496 [lib.cuckoo.core.machinery_manager] INFO: max_vmstartup_count for BoundedSemaphore = 5

machines :

id name label arch ip platform interface snapshot locked locked_changed_on status status_changed_on resultserver_ip resultserver_port reserved
1 Sandbox-Cape-VMSS-1_0 Sandbox-Cape-VMSS-1_0 x64 10.3.0.4 windows Sandbox-Cape-VMSS-Subnet-nic01 /subscriptions//images/CSS-Sandbox-Cape-Image-Definition-v8 f 10.0.6.7 2042 f

tags :

id name
1 x64

machines_tags :

machine_id tag_id
1 1

tasks_tags :

task_id tag_id

tasks :

id target category cape timeout priority custom machine package route tags_tasks options platform memory enforce_timeout clock added_on started_on completed_on status dropped_files running_processes api_calls domains signatures_total signatures_alert files_written registry_keys_modified crash_issues anti_issues analysis_started_on analysis_finished_on processing_started_on processing_finished_on signatures_started_on signatures_finished_on reporting_started_on reporting_finished_on timedout sample_id machine_id shrike_url shrike_refer shrike_msg shrike_sid parent_id tlp user_id username

So tasks_tags and tasks are obviously empty due to the error reported by OP.

At this point when submitting a new task cape executes this insert :

where in my test the parameter were set to :

As you can see even with a clean DB the insert tries to use two sets of task_id and tag_id :

Incrementing tag_id__1 from 1 to 2, even with 2 tasks submitted this behaviour would likely result in a DB error due to the fact that the value incremented is the tag_id and not the task_id.

At this point I don't actually know where this action is performed inside the code neither why so I'll have to keep digging until it makes sense.

@doomedraven @cccs-mog @tbeadle @cccs-kevin Since you guys have clearly more experience with the new logics of the machinery module could you please correct my hypothesis in case I got something wrong and follow up with the right logic when adding a task in the relation table tasks_tags ?

doomedraven commented 1 week ago

why do you comment out #tags = x64 ? that might be problem, idk i don't have azure

leoiancu21 commented 1 week ago

i tried to use tags instead of pool_tags in a previous test, wasn't successfull tho (my bad I had to read the code). Anyway it's presence doesn't change anything i made some previous tests without it and i still get that issue.

The configurations i tried are the following :

arch = x64
tags = x64
pool_tag = x64
arch = x64
tags = x64,x86
pool_tag = x64
arch = x64
#tags = x64
pool_tag = x64,x86

And the final one that i saw in the previous post, none of these configurations prevented the tag_id to be incremented like I pointed out before

doomedraven commented 1 week ago

So your problem is with azure right?

leoiancu21 commented 1 week ago

No, the problem is not directly related to azure. I found a way to bypass task tag relation in database.py by commenting out the section of the code in the add() function that deals with tags format :

        task.cape = cape
        task.tags_tasks = tags_tasks
        # Deal with tags format (i.e., foo,bar,baz)
        if tags:
            for tag in tags.split(","):
                tag_name = tag.strip()
                if tag_name and tag_name not in [tag.name for tag in task.tags]:
        #if tags:
        #    for tag in tags.split(","):
        #        tag_name = tag.strip()
        #        if tag_name and tag_name not in [tag.name for tag in task.tags]:
                    # "Task" object is being merged into a Session along the backref cascade path for relationship "Tag.tasks"; in SQLAlchemy 2.0, this reverse cascade will not take place.
                    # Set cascade_backrefs to False in either the relationship() or backref() function for the 2.0 behavior; or to set globally for the whole Session, set the future=True flag
                    # (Background on this error at: https://sqlalche.me/e/14/s9r1) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
                    task.tags.append(self._get_or_create(Tag, name=tag_name))
        #            task.tags.append(self._get_or_create(Tag, name=tag_name))

        if clock:
            if isinstance(clock, str):

but of course this is not a solution to the problem, I'm trying to understand if during the for loop the tag id gets wrongly incremented.

Could someone follow up with the actual expected output of this section ?

cccs-kevin commented 1 week ago

This may or may not help, but here is my VMSS configuration in az.conf:

[vmss-dev-cape-win10x64]
gallery_image_name = win10x64-cape
platform = windows
arch = x64
pool_tag = win10x64

Setting pool_tag == arch is odd, and may be the issue

leoiancu21 commented 1 week ago

This may or may not help, but here is my VMSS configuration in az.conf:

[vmss-dev-cape-win10x64]
gallery_image_name = win10x64-cape
platform = windows
arch = x64
pool_tag = win10x64

Setting pool_tag == arch is odd, and may be the issue

I'll try it right away

leoiancu21 commented 1 week ago

Well, it didn't work as expected :

This is the dump of what return self.add() has when submitting a file :

cape | ''
-- | --
clock | '06-21-2024 12:50:31'
custom | ''
enforce_timeout | False
file_md5 | '2401c281f6798633b66b2a4a14937354'
file_type | ('Composite Document File V2 Document, Little Endian, Os: Windows, Version '  '6.3, MSI Installer, Code page: 1252, Title: Installation Database, Subject: '  'Skype Meetings App, Author: Microsoft Corporation, Keywords: Installer, '  'Comments: This installer database contains the logic and data required to '  'install Skype Meetings App., Template: Intel;0, Revision Number: '  '{C6C0F413-901C-42A8-A7F1-D03BD40F9B12}, Create Time/Date: Sat Aug  3 '  '05:00:26 2019, Last Saved Time/Date: Sat Aug  3 05:00:26 2019, Number of '  'Pages: 300, Number of Words: 10, Name of Creating Application: Windows '  'Installer XML Toolset (3.11.1.2318), Security: 2')
fileobj | <lib.cuckoo.common.objects.File object at 0x78d9d998c280>
machine | None
memory | False
obj | <lib.cuckoo.common.objects.File object at 0x78d9d998ccd0>
options | ''
package | 'msi'
parent_id | None
platform | 'windows'
priority | 2
route | 'internet'
sample | <Sample(1,'73fdfb85b80b81c87e78580dc5b46a73c73f7907f8e6cff0886dcb6493365255')>
sample_parent_id | None
self | <lib.cuckoo.core.database._Database object at 0x78d9fa166c80>
shrike_msg | None
shrike_refer | None
shrike_sid | None
shrike_url | None
source_url | False
static | False
tag | 'x86'
tag_name | 'x86'
tags | 'win10x64,x86'
tags_tasks | ''
task | <Task(1,'/tmp/cuckoo-tmp/upload_v3_dh2v4/SkypeMeetingsApp.msi')>
timeout | 200
tlp | None
user_id | 0
username | False

As you can see it adds x86 following this logic in database.py

if isinstance(obj, (File, PCAP, Static)):
            fileobj = File(obj.file_path)
            file_type = fileobj.get_type()
            file_md5 = fileobj.get_md5()
            # check if hash is known already
            try:
                with self.session.begin_nested():
                    sample = Sample(
                        md5=file_md5,
                        crc32=fileobj.get_crc32(),
                        sha1=fileobj.get_sha1(),
                        sha256=fileobj.get_sha256(),
                        sha512=fileobj.get_sha512(),
                        file_size=fileobj.get_size(),
                        file_type=file_type,
                        ssdeep=fileobj.get_ssdeep(),
                        parent=sample_parent_id,
                        source_url=source_url,
                    )
                    self.session.add(sample)
            except IntegrityError:
                sample = self.session.query(Sample).filter_by(md5=file_md5).first()

            if DYNAMIC_ARCH_DETERMINATION:
                # Assign architecture to task to fetch correct VM type
                # This isn't 100% full proof
                if "PE32+" in file_type or "64-bit" in file_type or package.endswith("_x64"):
                    if tags:
                        tags += ",x64"
                    else:
                        tags = "x64"
                else:
                    if LINUX_ENABLED and platform == "linux":
                        linux_arch = _get_linux_vm_tag(file_type)
                        if linux_arch:
                            if tags:
                                tags += f",{linux_arch}"
                            else:
                                tags = linux_arch
                    else:
                        if tags:
                            tags += ",x86"
                        else:
                            tags = "x86"
            task = Task(obj.file_path)
            task.sample_id = sample.id

            if isinstance(obj, (PCAP, Static)):
                # since no VM will operate on this PCAP
                task.started_on = datetime.now()

        elif isinstance(obj, URL):
            task = Task(obj.url)
            tags = "x64"

        else:
            return None    

this results in the sqlalchemy.exc.IntegrityError error that OP posted, this is the error log from the webserver :

[21/Jun/2024 12:50:02] "GET /submit/ HTTP/1.1" 200 50025
/opt/CAPEv2/web/../lib/cuckoo/core/database.py:1264: SAWarning: Object of type <Task> not in session, add operation along 'Tag.tasks' won't proceed
  with self.session.begin_nested():
Internal Server Error: /submit/
Traceback (most recent call last):
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2108, in _exec_insertmany_context
    dialect.do_execute(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.ForeignKeyViolation: insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey"
DETAIL:  Key (tag_id)=(2) is not present in table "tags".

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/CAPEv2/web/submission/views.py", line 403, in index
    status, task_ids_tmp = download_file(**details)
  File "/opt/CAPEv2/web/../lib/cuckoo/common/web_utils.py", line 837, in download_file
    task_ids_new, extra_details = db.demux_sample_and_add_to_db(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1485, in demux_sample_and_add_to_db
    task_id = self.add_path(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1333, in add_path
    return self.add(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1264, in add
    with self.session.begin_nested():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 146, in __exit__
    with util.safe_reraise():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 144, in __exit__
    self.commit()
  File "<string>", line 2, in commit
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 136, in _go
    ret_value = fn(self, *arg, **kw)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1218, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 136, in _go
    ret_value = fn(self, *arg, **kw)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1193, in _prepare_impl
    self.session.flush()
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4142, in flush
    self._flush(objects)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4277, in _flush
    with util.safe_reraise():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4238, in _flush
    flush_context.execute()
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 591, in execute
    self.dependency_processor.process_saves(uow, states)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1178, in process_saves
    self._run_crud(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1241, in _run_crud
    connection.execute(statement, secondary_insert)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 483, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1839, in _execute_context
    return self._exec_insertmany_context(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2116, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2108, in _exec_insertmany_context
    dialect.do_execute(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey"
DETAIL:  Key (tag_id)=(2) is not present in table "tags".

[SQL: INSERT INTO tasks_tags (task_id, tag_id) VALUES (%(task_id__0)s, %(tag_id__0)s), (%(task_id__1)s, %(tag_id__1)s)]
[parameters: {'task_id__0': 1, 'tag_id__0': 1, 'task_id__1': 1, 'tag_id__1': 2}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
ERROR:django.request:Internal Server Error: /submit/
Traceback (most recent call last):
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2108, in _exec_insertmany_context
    dialect.do_execute(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.ForeignKeyViolation: insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey"
DETAIL:  Key (tag_id)=(2) is not present in table "tags".

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/CAPEv2/web/submission/views.py", line 403, in index
    status, task_ids_tmp = download_file(**details)
  File "/opt/CAPEv2/web/../lib/cuckoo/common/web_utils.py", line 837, in download_file
    task_ids_new, extra_details = db.demux_sample_and_add_to_db(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1485, in demux_sample_and_add_to_db
    task_id = self.add_path(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1333, in add_path
    return self.add(
  File "/opt/CAPEv2/web/../lib/cuckoo/core/database.py", line 1264, in add
    with self.session.begin_nested():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 146, in __exit__
    with util.safe_reraise():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/util.py", line 144, in __exit__
    self.commit()
  File "<string>", line 2, in commit
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 136, in _go
    ret_value = fn(self, *arg, **kw)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1218, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 136, in _go
    ret_value = fn(self, *arg, **kw)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1193, in _prepare_impl
    self.session.flush()
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4142, in flush
    self._flush(objects)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4277, in _flush
    with util.safe_reraise():
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4238, in _flush
    flush_context.execute()
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 591, in execute
    self.dependency_processor.process_saves(uow, states)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1178, in process_saves
    self._run_crud(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/orm/dependency.py", line 1241, in _run_crud
    connection.execute(statement, secondary_insert)
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 483, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1839, in _execute_context
    return self._exec_insertmany_context(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2116, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2108, in _exec_insertmany_context
    dialect.do_execute(
  File "/home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "tasks_tags" violates foreign key constraint "tasks_tags_tag_id_fkey"
DETAIL:  Key (tag_id)=(2) is not present in table "tags".

[SQL: INSERT INTO tasks_tags (task_id, tag_id) VALUES (%(task_id__0)s, %(tag_id__0)s), (%(task_id__1)s, %(tag_id__1)s)]
[parameters: {'task_id__0': 1, 'tag_id__0': 1, 'task_id__1': 1, 'tag_id__1': 2}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
HUSMUS9999 commented 1 week ago

the problem is not related to machinery i guess even when trying to perform static analysis the same problem happen its not also related to cloud hosting im using local hosting

doomedraven commented 1 week ago

When if that happens in local you need to provide versions of db, sqlachemy etc as I have no issues

leoiancu21 commented 1 week ago

When if that happens in local you need to provide versions of db, sqlachemy etc as I have no issues

Could you elaborate on how this could be an issue relate to the db and sqlalchemy version, from my previous reply if you read the error log it seems like a logical issue inside the code. Anyway since I might (and hope) to be wrong these are the versions locally installed :

Since OP has the same issue I think that it's higly unlikely that it could be related to the psql or sqlalchemy version.

leoiancu21 commented 1 week ago

No, the problem is not directly related to azure. I found a way to bypass task tag relation in database.py by commenting out the section of the code in the add() function that deals with tags format :

        task.cape = cape
        task.tags_tasks = tags_tasks
        # Deal with tags format (i.e., foo,bar,baz)
        if tags:
            for tag in tags.split(","):
                tag_name = tag.strip()
                if tag_name and tag_name not in [tag.name for tag in task.tags]:
        #if tags:
        #    for tag in tags.split(","):
        #        tag_name = tag.strip()
        #        if tag_name and tag_name not in [tag.name for tag in task.tags]:
                    # "Task" object is being merged into a Session along the backref cascade path for relationship "Tag.tasks"; in SQLAlchemy 2.0, this reverse cascade will not take place.
                    # Set cascade_backrefs to False in either the relationship() or backref() function for the 2.0 behavior; or to set globally for the whole Session, set the future=True flag
                    # (Background on this error at: https://sqlalche.me/e/14/s9r1) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
                    task.tags.append(self._get_or_create(Tag, name=tag_name))
        #            task.tags.append(self._get_or_create(Tag, name=tag_name))

        if clock:
            if isinstance(clock, str):

but of course this is not a solution to the problem, I'm trying to understand if during the for loop the tag id gets wrongly incremented.

Could someone follow up with the actual expected output of this section ?

At the moment i removed the comments from the previous reply and commented these lines instead :

if DYNAMIC_ARCH_DETERMINATION:
                # Assign architecture to task to fetch correct VM type
                # This isn't 100% full proof
                if "PE32+" in file_type or "64-bit" in file_type or package.endswith("_x64"):
                    if tags:
                        tags += ",x64"
                    else:
                        tags = "x64"
                else:
                    if LINUX_ENABLED and platform == "linux":
                        linux_arch = _get_linux_vm_tag(file_type)
                        if linux_arch:
                            if tags:
                                tags += f",{linux_arch}"
                            else:
                                tags = linux_arch
                    #else:
                    #    if tags:
                    #        tags += ",x86"
                    #    else:
                    #        tags = "x86"              

Now the tasks_tags relation works correctly. Any idea on how to fix that part of code without removing the section like I did ?

HUSMUS9999 commented 1 week ago

@leoiancu21 Thanks you for helping it worked for me

leoiancu21 commented 1 week ago

@leoiancu21 Thanks you for helping it worked for me

My pleasure, does your analysis work completely now ? I'm still stuck in the reporting procedure where the analysis gets stuck but i don't know if this is related to this issue

HUSMUS9999 commented 1 week ago

yes exactly image

is this your case ??

leoiancu21 commented 1 week ago

@HUSMUS9999 not exactly but it could be I have to debug a bit in order to understand it , I don't think that this could be related tho. I think it would be better to open another issue with that specific problem since the actual fix for this issue is still not found and I don't want to mix problems

HUSMUS9999 commented 1 week ago

alright thanks @leoiancu21 do i have to close this issue ?

leoiancu21 commented 1 week ago

Unfortunately it's not fixed, we just used a workaround so i would leave it open so if someone finds a way to solve the problem can reply with a commit and close it

doomedraven commented 1 week ago

When if that happens in local you need to provide versions of db, sqlachemy etc as I have no issues

Could you elaborate on how this could be an issue relate to the db and sqlalchemy version, from my previous reply if you read the error log it seems like a logical issue inside the code. Anyway since I might (and hope) to be wrong these are the versions locally installed :

  • PostgreSQL 16.3
  • SQLAlchemy Version: 2.0.16

Since OP has the same issue I think that it's higly unlikely that it could be related to the psql or sqlalchemy version.

bcz i don't have issue with tags, and i don't have to modify nothing at all. and my amount of cape servers is huge

doomedraven commented 1 week ago

i have the same software version, just checked it

@tbeadle i just saw in log, maybe you can help here a bit more(i have fly so my mind is pretty off)

/opt/CAPEv2/web/../lib/cuckoo/core/database.py:1264: SAWarning: Object of type <Task> not in session, add operation along 'Tag.tasks' won't proceed
  with self.session.begin_nested():
tbeadle commented 1 day ago

@HUSMUS9999 Could you please put the following in custom/conf/cuckoo.conf:

[database]
log_statements = on

restart cape-web and try the submission again. I'm interested in SQL statements issued. They should be in /var/log/django/access.log. The _get_or_create call should be issuing a SELECT to see if the x86 Tag exists and, if it doesn't, it should create it at the start of the begin_nested call when the Task object is added to the session so that it has an ID that can be used to add it to the tags_tasks table.

So far, I have not been able to reproduce the problem.

t-mtsmt commented 1 day ago

I have the same problem.

I changed the following settings and then checked the cape-web logs.

[database]
log_statements = on

As you can see in the log below, the tag was deleted immediately after the new tag was inserted.

Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,248 INFO sqlalchemy.engine.Engine SELECT tags.id AS tags_id, tags.name AS tags_name
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.name = %(name_1)s
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]:  LIMIT %(param_1)s
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:SELECT tags.id AS tags_id, tags.name AS tags_name
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.name = %(name_1)s
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]:  LIMIT %(param_1)s
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,250 INFO sqlalchemy.engine.Engine [generated in 0.00126s] {'name_1': 'x86', 'param_1': 1}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:[generated in 0.00126s] {'name_1': 'x86', 'param_1': 1}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,259 INFO sqlalchemy.engine.Engine INSERT INTO tags (name) VALUES (%(name)s) RETURNING tags.id
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:INSERT INTO tags (name) VALUES (%(name)s) RETURNING tags.id
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,260 INFO sqlalchemy.engine.Engine [generated in 0.00082s] {'name': 'x86'}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:[generated in 0.00082s] {'name': 'x86'}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: /opt/CAPEv2/web/../lib/cuckoo/core/database.py:1264: SAWarning: Object of type <Task> not in session, add operation along 'Tag.tasks' won't proceed
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]:   with self.session.begin_nested():
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,265 INFO sqlalchemy.engine.Engine DELETE FROM tags WHERE NOT (EXISTS (SELECT 1
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM tasks, tasks_tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.id = tasks_tags.tag_id AND tasks.id = tasks_tags.task_id)) AND NOT (EXISTS (SELECT 1
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM machines, machines_tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.id = machines_tags.tag_id AND machines.id = machines_tags.machine_id))
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:DELETE FROM tags WHERE NOT (EXISTS (SELECT 1
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM tasks, tasks_tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.id = tasks_tags.tag_id AND tasks.id = tasks_tags.task_id)) AND NOT (EXISTS (SELECT 1
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: FROM machines, machines_tags
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: WHERE tags.id = machines_tags.tag_id AND machines.id = machines_tags.machine_id))
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,267 INFO sqlalchemy.engine.Engine [cached since 0.02688s ago] {}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:[cached since 0.02688s ago] {}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,270 INFO sqlalchemy.engine.Engine SAVEPOINT sa_savepoint_2
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:SAVEPOINT sa_savepoint_2
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: 2024-06-28 02:43:47,270 INFO sqlalchemy.engine.Engine [no key 0.00078s] {}
Jun 28 02:43:47 ubuntu2204.localdomain python3[26908]: INFO:sqlalchemy.engine.Engine:[no key 0.00078s] {}

I also commented out the following process in CAPEv2/lib/cuckoo/core/database.py and it now works correctly.

        # There should be a better way to clean up orphans. This runs after every flush, which is crazy.
        # @event.listens_for(self.session, "after_flush")
        # def delete_tag_orphans(session, ctx):
        #     session.query(Tag).filter(~Tag.tasks.any()).filter(~Tag.machines.any()).delete(synchronize_session=False)

I suspect there is a problem with the above process for removing unused tags.

tbeadle commented 14 hours ago

This appears to be due to a change in behavior between the version of SQLAlchemy that is required via pyproject.toml/poetry.lock or requirements.txt (1.4.50) and the version that you're running (2.0+). I was able to reproduce the error by running poetry run pip install SQLAlchemy==2.0.16 and then submitting a sample. I'm not sure how you installed CAPE, but if you use poetry install --sync, it should install all the dependencies with the versions locked in poetry.lock. Please try this and let me know if that solves the problem.

t-mtsmt commented 1 hour ago

Thank you. This problem was solved by downgrading from SQLAlchemy 2.0.31 to 1.4.50.

It was caused by running the following command according to the documentation.
https://capev2.readthedocs.io/en/latest/installation/host/installation.html#optional-dependencies

sudo -u cape poetry run pip install -r extra/optional_dependencies.txt

The above command installed flask-sqlalchemy and upgraded SQLAlchemy 1.4.50 to 2.0.31.
https://github.com/kevoreilly/CAPEv2/blob/master/extra/optional_dependencies.txt#L7C1-L7C17

doomedraven commented 58 minutes ago

ok commented that out, thanks Tommy for help