google / turbinia

Automation and Scaling of Digital Forensics Tools
Apache License 2.0
746 stars 163 forks source link

Child tasks/evidence objects lose parent request_id #313

Closed ericzinnikas closed 5 years ago

ericzinnikas commented 5 years ago

I haven't had a chance to dig into this further, but I'm noticing that request_id is not passed onto child evidence objects or tasks. I'm not sure if this is specific to Celery or PSQ as well.

For example (the following logs are from some debug statements on the server), sending a RawDisk evidence for processing we see the following PlasoTask / RawDisk objects created: {'result': None, 'tmp_dir': None, 'name': 'PlasoTask', 'run_local': False, '_evidence_config': {}, 'base_output_dir': u'/evidence/output', 'state_key': u'TurbiniaTask:5216aea82a2b48b4bfb1601e84c16451', 'last_update': '2018-11-30 15:54:30.436344', 'stub': None, 'user': 'ericwz', 'request_id': u'731123b983434cb59514caf327cd3129', 'output_manager': {'_output_writers': None, 'is_setup': False}, 'id': '5216aea82a2b48b4bfb1601e84c16451', 'output_dir': None}

{u'mount_partition': 1, u'type': u'RawDisk', u'mount_path': None, u'tags': {}, u'processed_by': [], u'copyable': False, u'saved_path_type': None, u'name': u'test', u'source': u'example', u'saved_path': None, u'loopdevice_path': None, u'request_id': u'731123b983434cb59514caf327cd3129', u'local_path': u'/Users/ericwz/SCHARDT.dd', u'size': None, u'config': {}, u'cloud_only': False, u'description': None}

But then once that completes, the following PsortTask/Plaso file objects now have request_id: None: {'result': None, 'tmp_dir': None, 'name': 'PsortTask', 'run_local': False, '_evidence_config': {}, 'base_output_dir': u'/evidence/output', 'state_key': u'TurbiniaTask:92bef4ad649d48cd91b575692588c880', 'last_update': '2018-11-30 15:57:32.114775', 'stub': None, 'user': 'ericwz', 'request_id': None, 'output_manager': {'_output_writers': None, 'is_setup': False}, 'id': '92bef4ad649d48cd91b575692588c880', 'output_dir': None} {u'plaso_version': None, u'description': None, u'tags': {}, u'type': u'PlasoFile', u'copyable': True, u'source': None, u'saved_path': None, u'saved_path_type': None, u'request_id': None, u'local_path': u'/evidence/output/1543622070-6306431d0f8a4a8d830af57e4b502652-PlasoTask/6306431d0f8a4a8d830af57e4b502652.plaso', u'config': {}, u'processed_by': [], u'cloud_only': False, u'name': u'PlasoFile'}

I see in workers/__init__.py line 139 we do copy over the request_id when closing a task, but apparently at this point the TurbiniaTaskResult object's request_id is already None as well. So I'm not sure where it is getting lost. If I have time later today I'll try to track it down & update here.

aarontp commented 5 years ago

Does this happen only when there are failures? or all the time? FWIW, I haven't seen this happen before when using PSQ, though I can't think of why that would affect it.

ericzinnikas commented 5 years ago

Happens all the time. Inserted some prints to debug and even after the first job finishes (plaso) there's no request id in the result object that is returned. Maybe I broke something with the serialization code I added. Will test again before that diff and report back.

On Mon, Dec 3, 2018, 18:44 Aaron Peterson <notifications@github.com wrote:

Does this happen only when there are failures? or all the time? FWIW, I haven't seen this happen before when using PSQ, though I can't think of why that would affect it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/turbinia/issues/313#issuecomment-443949733, or mute the thread https://github.com/notifications/unsubscribe-auth/ADPKpjF3IaB7k6o1lQweJy3uvVF2ERn6ks5u1eGGgaJpZM4Y8lgr .

aarontp commented 5 years ago

Does this still happen?

ericzinnikas commented 5 years ago

Working as expected now.