stamparm / maltrail

Malicious traffic detection system
MIT License
6.4k stars 1.07k forks source link

"Slow Loading" Why maltrail uses only one core? #96

Open sametsazak opened 8 years ago

sametsazak commented 8 years ago

Hi,

I realized something. I have a big "dprk.txt" in /trails/custom/ directory. It's almost 20mb. When I started maltrail, the process takes very long time. I know it's reading from file and writing to the buffer but; why is it using only one core? I have many cores:)

Did you think that? @stamparm

stamparm commented 8 years ago

Loading of custom TXT should not take too long. Updates are way slower. Also, loading of trails is something that is being done once (on load) and that's it. Is there a way for me to reproduce your experienced behavior? Also, how long does it take in your case?

sametsazak commented 8 years ago

@stamparm

For example, I'm updating maybe 5-6 six times my custom trail text. In my case, approximately 15 minutes it takes. 5x15 = 75 minutes lost. It is not a problem that maltrail fetching from threats from sources. The main problem is when maltrail starts to loading /root/.maltrail/trails.csv to the buffer. It just uses only one core. Then it reproduces.

Just try it, use a big custom text file in your custom trail directory.

stamparm commented 8 years ago

Can you please explain this part (2x2 6gb)?

What's the size of that custom trail file(s)? You've mentioned in previous message 20mb and now I see gigabytes.

stamparm commented 8 years ago

p.s. just a friendly tip. If you are expanding IP ranges and using those as trails, from couple of months you are able to use condensed IP ranges format (e.g. x.y.z.w/u or even x.y.z.w-a.b.c.d)

stamparm commented 8 years ago

Still, just wonder that why it doesn't work multiprocess? -because nobody ever loads configuration files (and/or trail files in this case) in multiprocessing mode. Multi-processing is used for high CPU intensity tasks, like math calculations, hash cracking, etc. Loading of textual files is something that should not be classified as CPU intensive task. It is more a stress to other resources, not CPU

I am just trying to understand what is going on in your case and trying to replicate the behavior

sametsazak commented 8 years ago

@stamparm I understand that. But if you want to understand what i mean; just start maltrail with 10 million IP adresses in a custom text file with diffrerent ram sizes. Wait forever.

stamparm commented 8 years ago

Ok. That's why I've said If you are expanding IP ranges and using those as trails, from couple of months you are able to use condensed IP ranges format (e.g. x.y.z.w/u or even x.y.z.w-a.b.c.d)

Have you manually expanded those IPs from IP ranges?