Performance issues on AWS with EBS

log2timeline / plaso

Super timeline all the things

https://plaso.readthedocs.io

Apache License 2.0

1.73k stars 348 forks source link

Performance issues on AWS with EBS #3156

Closed siftuser closed 4 years ago

siftuser commented 4 years ago

Description of problem:

continuation from #3151 ...

As suggested updated plaso to latest version. Initially I ran psteal on e01 image of a windows 10 disk (250gb) ... when it was showing 12-15% completion after 12 hours, killed the process & exited. Thought would instead run on a mounted volume. This time plaso auto exited midway after running for ~60 hours. Any suggestion ? Thanks

plaso 2020-07-08 at 9 06 02 AM

Command line and arguments:

psteal.py --source /mnt/windows_mount/ -o l2tcsv -w /data/plaso/l2t.txt

Plaso version:

plaso version 20200630

Operating system Plaso is running on:

ubuntu 18.04

Installation method:

sudo apt-get install python-plaso plaso-tools

joachimmetz commented 4 years ago

Also to rule out obvious IO bottlenecks, you are reading and writing to both a local disk? No network disk or removable media? And you are running Docker on a native OS not inside a VM?

siftuser commented 4 years ago

@joachimmetz It's running on aws t2.xlarge. OS, e01 & plaso are on different volumes OS - EBS root vol segmented e01 - EBS vol1 plaso & related logs - EBS vol2

joachimmetz commented 4 years ago

Ack, thx, reading over this thread again you did mention: aws t2.xlarge (4 vcpu, 16gb ram)

I'm not an AWS expert but if I have to believe: https://medium.com/awesome-cloud/aws-difference-between-ebs-and-instance-store-f030c4407387#:~:text=TL%3BDR,level%20storage%20for%20your%20instance.

EBS volume is network attached drive which results in slow performance but data is persistent meaning even if you reboot the instance data will be there.

Instance store instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer.

Than EBS could be one of the bottlenecks. One thing to try is to use local storage at least for the output data. Using network storage for writes typically adds a lot of latency.

joachimmetz commented 4 years ago

The workers being killed and respawned is another bottleneck

Using Docker and virtualization will add overhead as well, but let's use Docker for now to rule out the work deaths to be caused by an issue in the local installation.

siftuser commented 4 years ago

Thanks @joachimmetz - the challenge with Instance store backed Instances is maximum 10GiB volume size

is there a way to gets stats to prove IO bottlenecks to ask for IO optimized instances ? Thanks

joachimmetz commented 4 years ago

is there a way to gets stats to prove IO bottlenecks to ask for IO optimized instances ? Thanks

Besides using AWS "project" and system-level IO stats, maybe Plaso storage profler (--profilers=storage) might be able to tell something on read and write times. Note that this functionality is still WIP, so it might not give a full overview of read and writes yet.

siftuser commented 4 years ago

@joachimmetz the last effort was not working (after about 8 days, completed < 70%). Created new instance using t2.2xlarge. Glad it finally completed processing entire disk in about 30 hours. Used these profilers memory,parsers,storage if you are interested in the logs.

Thank you for your help all this while

joachimmetz commented 4 years ago

Good to hear that it completed. Note that these are not normal processing times.

the last effort was not working (after about 8 days, completed < 70%).

did you see worker deaths?

If you can pass me the profiler parser memory files that might point me to what was/is eating up the worker memory.

siftuser commented 4 years ago

did you see worker deaths?

yes, they were :-(

If you can pass me the profiler parser memory files that might point me to what was/is eating up the worker memory.

surely, will do. Thanks

joachimmetz commented 4 years ago

@siftuser any update on the profiler files? otherwise I'll close this issue as cannot fix without necessary information.

joachimmetz commented 4 years ago

Closing issue. Feel free to reopen if needed