Job getting killed - Githubissues

CaptSpify commented 6 years ago

Whenever I try to retrieve a job, it ends up getting killed. I'm guessing this is because it's a big download, but I'm not sure.

The command I'm running: /usr/bin/mtglacier download-inventory --config myconfigfile --vault myvaultname --new-journal /my/dir/journalfilename.log

The results I'm getting:

PID 16124 Started worker
PID 16124 Fetched job list for inventory retrieval
PID 16124 HTTP Unexpected end of data. Will retry (626 seconds spent for request)
PID 16124 Downloaded inventory in JSON format
Killed

Let me know if there's any other information needed

vsespb commented 6 years ago

hi. OS version, perl version, device, free memory, appx. inventory size? also check syslog for oom-killer (Out of memory and killed processes).

Message "Killed" goes from your Shell, not mt-aws-glacier (see https://www.thecodingforums.com/threads/killed-message-on-linux-running-from-bash.895178/#post-4802463 ) - someone killed the process with kill -9. It must be oom-killer.

CaptSpify commented 6 years ago

OS: Debian 8 perl: v5.20.2 free-memory: 50G Size: 2892174934222595

It does look like oom killer. My question then: Why is it using so much memory?

vsespb commented 6 years ago

Size: 2892174934222595

what does this number mean and where did you get it?

CaptSpify commented 6 years ago

I got it from: /usr/bin/mtglacier --config myfile.cfg list-vaults

In the return, there is a field that says: Archives: 50596957, Size: 2892174934222595

Are you looking for a different size? I'm guessing that is in PetaBytes?

vsespb commented 6 years ago

Archives: 50596957, Size: 2892174934222595

if those figures looks correct to you (and match what Amazon Console shows), then ok. otherwise, if it's much higher than it should be, need investigate why so.

anyway, you can try add after line 47 https://github.com/vsespb/mt-aws-glacier/blob/master/lib/App/MtAws/Glacier/ListVaults.pm#L47

debug code:

open F, ">", "/tmp/mtvaults.json";
print F $self->{rawdata};
close F;

and the from command line try to parse this JSON with perl module:

perl -MFile::Slurp -MJSON::XS -e 'print JSON::XS->new->allow_nonref->decode(read_file("/tmp/mtvaults.json"))'

and see if it crashes because of memory. and also see if other json tools can survive. and just check JSON file size in bytes.

I think Amazon invented CSV format maybe for such case. see --request-inventory-format option for retrieve-inventory. maybe i'll take less RAM, maybe no.

CaptSpify commented 6 years ago

It looks like it dies before it writes the file. Either way, I was able to get it to download by stopping pretty much all other processes on that server. I wonder if one of them was fighting for ram.

I'll try the retrieve-inventory later, but I see on the readme that mtglacier has issues with CSV files. Is that something that you think would impact this?

vsespb commented 6 years ago

I'll try the retrieve-inventory later, but I see on the readme that mtglacier has issues with CSV files. Is that something that you think would impact this?

there are (or was) bugs in edge cases. "normal" users not affected, i think. also csv parsing is slower (but maybe uses less ram, maybe).

also you have 50M files and 50G RAM, so you have 1000bytes per record. one journal entry on disk takes (minimum!) 240 bytes (archive id, checksums, timestamps) + filename size. it's on disk. internal structures can take more. quite possible that you wont be able to work with such a big journal on that machine (different commands will require different amount of RAM with this journal).

vsespb commented 6 years ago

also: If you successfully download-inventory (i.e. get mtglacier journal for the inventory), you still can do some stuff without enought memory. In docs:

It's a text file. You can parse it with grep awk cut, tail etc, to extract information in case you need perform some advanced stuff, that mtglacier can't do (NOTE: make sure you know what you're doing ).

Each text line in a file represent one record

i.e. you can, for example, split it into 1000 small journals (with tools like head or tail) and do some simple operations - like delete all files or restore all files. Obviously you can't do complex logic like "sync only modified files" as it might require to compare local file with one recorded in journal.

vsespb / mt-aws-glacier

Job getting killed #135