Closed siftuser closed 4 years ago
Also to rule out obvious IO bottlenecks, you are reading and writing to both a local disk? No network disk or removable media? And you are running Docker on a native OS not inside a VM?
@joachimmetz It's running on aws t2.xlarge. OS, e01 & plaso are on different volumes OS - EBS root vol segmented e01 - EBS vol1 plaso & related logs - EBS vol2
Ack, thx, reading over this thread again you did mention: aws t2.xlarge (4 vcpu, 16gb ram)
I'm not an AWS expert but if I have to believe: https://medium.com/awesome-cloud/aws-difference-between-ebs-and-instance-store-f030c4407387#:~:text=TL%3BDR,level%20storage%20for%20your%20instance.
EBS volume is network attached drive which results in slow performance but data is persistent meaning even if you reboot the instance data will be there.
Instance store instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer.
Than EBS could be one of the bottlenecks. One thing to try is to use local storage at least for the output data. Using network storage for writes typically adds a lot of latency.
The workers being killed and respawned is another bottleneck
Using Docker and virtualization will add overhead as well, but let's use Docker for now to rule out the work deaths to be caused by an issue in the local installation.
Thanks @joachimmetz - the challenge with Instance store backed Instances is maximum 10GiB volume size
is there a way to gets stats to prove IO bottlenecks to ask for IO optimized instances ? Thanks
is there a way to gets stats to prove IO bottlenecks to ask for IO optimized instances ? Thanks
Besides using AWS "project" and system-level IO stats, maybe Plaso storage profler (--profilers=storage
) might be able to tell something on read and write times. Note that this functionality is still WIP, so it might not give a full overview of read and writes yet.
@joachimmetz the last effort was not working (after about 8 days, completed < 70%). Created new instance using t2.2xlarge. Glad it finally completed processing entire disk in about 30 hours. Used these profilers memory,parsers,storage
if you are interested in the logs.
Thank you for your help all this while
Good to hear that it completed. Note that these are not normal processing times.
the last effort was not working (after about 8 days, completed < 70%).
did you see worker deaths?
If you can pass me the profiler parser memory files that might point me to what was/is eating up the worker memory.
did you see worker deaths?
yes, they were :-(
If you can pass me the profiler parser memory files that might point me to what was/is eating up the worker memory.
surely, will do. Thanks
@siftuser any update on the profiler files? otherwise I'll close this issue as cannot fix without necessary information.
Closing issue. Feel free to reopen if needed
Description of problem:
continuation from #3151 ...
As suggested updated plaso to latest version. Initially I ran psteal on e01 image of a windows 10 disk (250gb) ... when it was showing 12-15% completion after 12 hours, killed the process & exited. Thought would instead run on a mounted volume. This time plaso auto exited midway after running for ~60 hours. Any suggestion ? Thanks
Command line and arguments:
psteal.py --source /mnt/windows_mount/ -o l2tcsv -w /data/plaso/l2t.txt
Plaso version:
plaso version 20200630
Operating system Plaso is running on:
ubuntu 18.04
Installation method:
sudo apt-get install python-plaso plaso-tools