krmaxwell / maltrieve

A tool to retrieve malware directly from the source for security researchers.
GNU General Public License v3.0
562 stars 184 forks source link

Still not completing properly? #58

Closed krmaxwell closed 9 years ago

krmaxwell commented 10 years ago
kmaxwell@leibniz:~/src/maltrieve$ date
Thu Aug 28 11:33:45 CDT 2014
kmaxwell@leibniz:~/src/maltrieve$ tail -f maltrieve.log
2014-08-28 09:56:42 139828685420352 "GET /dmjqxshzxk HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /yinjingdaxiao HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /chzgjqvod HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /xxrttk HTTP/1.1" 200 281
2014-08-28 09:56:43 139828685420352 "GET /hhchrxyx HTTP/1.1" 200 281
2014-08-28 10:06:26 139828685420352 "GET /ddqs HTTP/1.1" 200 281
2014-08-28 10:06:28 139828685420352 "GET /rbavanyxat HTTP/1.1" 200 281
2014-08-28 10:06:29 139828685420352 "GET /rbdmmntpw HTTP/1.1" 200 281
2014-08-28 10:06:32 139828685420352 "GET /kbllqjllswdyhmfdy HTTP/1.1" 200 281
2014-08-28 10:06:36 139828685420352 "GET /yzhsqqvod HTTP/1.1" 200 281
^C
kmaxwell@leibniz:~/src/maltrieve$ 
krmaxwell commented 9 years ago

Still reproed.

krmaxwell commented 9 years ago

I think #120 fixed this. Pending a few days of uninterrupted functioning before I have enough confidence to close the issue, though.

krmaxwell commented 9 years ago

NOPE!

rubinatorz commented 9 years ago

It seems that the latest pending download is extremely slow... when using wget to get that file manually, it's extremely slow too. Maybe the specific site slows down the whole process?

mlawsonis commented 9 years ago

Any work arounds if running in as a cronjob?

krmaxwell commented 9 years ago

There's a tentative fix in the dev branch now. Feel free to try it if you're brave!

rubinatorz commented 9 years ago

I tried your fix and the first time I ran it, it looked like it solved the problem:

$ ./maltrieve.py Processing source URLs Completed source processing Downloading samples, check log for details Completed downloads

But after running it for the 2nd and 3rd time, it seems hanging again...

Running it for the 4th time, it looks like it hangs for several times, but after 3 hours it finally completes. Strange, because 3 hours seems quite long for just 43 executables. When looking at the timestamps in the log I see 4 moments with minutes(!) of inactivity:


2015-04-01 11:29:01 -1219635456 "GET /soft/UploadFile/201501/win7/2015011205.exe HTTP/1.1" 200 2269518
2015-04-01 11:35:01 -1219635456 http://hiroba.dqx.jp.sd.sqbzb.com/ hashes to 62505f9257668a74605bfed1e35cc10e
...
2015-04-01 11:35:21 -1219635456 "GET / HTTP/1.1" 200 None
2015-04-01 12:39:30 -1219635456 http://dx23.downyouxi.com/wodishijie1.4.2zhongwenban.exe hashes to 05c64c0870a8c8a62bd09ddf41ddf0a2
....
2015-04-01 12:44:23 -1219635456 "GET /shuimiaoflashtzscq.exe HTTP/1.1" 200 1982464
2015-04-01 13:28:41 -1219635456 http://jszshb.com/ hashes to 1b9e8f2caa3d6a4df4c117adb5470c9f
...
2015-04-01 13:31:50 -1219635456 "GET /category/ HTTP/1.1" 500 None
2015-04-01 13:56:11 -1219635456 http://creative.ad132m.com/ad132m/scripts/direct/direct.html?
krmaxwell commented 9 years ago

This looks like one host sends data at least every 60 seconds, but not much. So the overall download speed for that sample is very slow. I will add an enhancement to track download time per sample to the backlog, but it won't be in this release (v0.7).