rrybarczyk / asmap-rs

A tool to assist the asmap project read and parse RIS raw data from the RIPE NCC.
8 stars 5 forks source link

Optimize #10

Open rrybarczyk opened 4 years ago

rrybarczyk commented 4 years ago

Currently asmap-rs takes 15 Gb of RAM, need to get that down to 4Gb.

Rationale from @naumenkogs: "Ideally, any user should have the right to be paranoid and do everything on their own, including constructing their own asmap locally. Obviously, requiring constructing asmaps on the phone is not reasonable, but maybe comparing to compiling Bitcoin Core is reasonable. This takes 2-4Gb of RAM."

For profiling and narrowing down the inefficient code:

rrybarczyk commented 4 years ago

The stack trace is profiled using flamegraph for both the download and find-bottleneck command for rrc01, 02, 03, 14, 17 (which has an invalid gzip header), and 18.

What flamegraphs tells us is what functions are on the stack the most.

The results are commited to the profile branch in the flmg directory and added the commands I used in the justfile so they can be easily called again.

Regarding flmg/fb-1-2-3-14-17-18.svg:

Expectedly, we see that asmap-rs::find_bottleneck::FindBottleneck::locate and its accumulated child calls make up the majority of the callchain.

The most common descendants on the stack are (this does not and should not equal 100% since the children contribute to their parents percentages):

asmap-rs::find_bottleneck::FindBottleneck::locate, 97.10%
├── asmap_rs::find_bottleneck::FindBottleneck::match_rib_entry, 32.82%
│   └── asmap_rs::as_path_parser::AsPathParser::parse, 20.72%
└── mrt_rs::Reader<T>::read, 45.24%
     └── mrt_rs::records::tabledump::RIBEntry::parse, 39.93%
          └── inflate (from flate2 GzDecoder called inside the mrt_rs crate),17.97%

Conclusion: These are the areas we should to look into optimizing.

Thoughts and proposed steps forward:

Would love to hear your thoughts and get feedback, @naumenkogs.

naumenkogs commented 4 years ago

Did compiler optimize the code for these measurements?

Would be great to add these results to some README file with explanations. I think this is good enough. I think we should still focus on RAM for now though.

naumenkogs commented 4 years ago

Making AsPathParser seem to be a waste of time now, because the actual algorithm takes very small. It's crazy how much mrt-rs takes, and then the second use is AsPathParser::parse, so, just, reading BGP records... Maybe we can do some optimizations in this method. I'll take a look sometime.

rrybarczyk commented 4 years ago

Did compiler optimize the code for these measurements?

Would be great to add these results to some README file with explanations. I think this is good enough. I think we should still focus on RAM for now though.

Yes, it is release because flamegraph defaults to release.

Making AsPathParser seem to be a waste of time now, because the actual algorithm takes very small. It's crazy how much mrt-rs takes, and then the second use is AsPathParser::parse, so, just, reading BGP records... Maybe we can do some optimizations in this method. I'll take a look sometime.

Can you elaborate on these two sentences? They seem to be conflicting because the first sentences says AsPathParser is a waste of time, but then the second one acknowledges that we should optimize here.

I have been thinking about the right path forward when it comes to optimization...

When it comes to what we are optimizing for, we said that we are trying to get the RAM usage lower. However, this is still vague on my end. I want to make sure we are optimizing for a specific case. There are many, many possibilities for what systems end users will be running this program on. The systems could be anything: a 2gb machine vs a 4gb machine vs a 4gb plus tons of swap, etc.

Right now, it seems like there are other things that may be more valuable. Like writing more tests for correctness and getting people to use it. Once people use it, we can figure out what (if any) problems they are encountering and then optimize for those specific issues.

For immediate actions, I think we should write more tests to ensure correctness. This way we ensure we are not introducing any bugs as we optimize.

That being said, I want to inflate the gz files in the download command because it is simple "optimization" that for any system will yield a faster runtime.

In terms of loading in the data in chunks in an attempt to reduce the RAM usage, it might not be the right optimization to try first, but I am still curious to see the difference in the resulting flamegraphs so we can still go for it.

naumenkogs commented 4 years ago

Can you elaborate on these two sentences? They seem to be conflicting because the first sentences says AsPathParser is a waste of time, but then the second one acknowledges that we should optimize here.

I said algorithm, meaning all the logical loops (as opposed to parse method which just converts strings into numbers). The latter may be worth optimizing, the former is too small in the overall run time to bother.

When it comes to what we are optimizing for, we said that we are trying to get the RAM usage lower. However, this is still vague on my end. I want to make sure we are optimizing for a specific case. There are many, many possibilities for what systems end users will be running this program on. The systems could be anything: a 2gb machine vs a 4gb machine vs a 4gb plus tons of swap, etc.

I would say we are optimizing for reducing runtime. Use of swap is bad because it makes everything much slower, that's why keeping memory within RAM limits. I would target RAM required to compile Bitcoin Core, which is around 2-4Gb. I think this is a reasonable requirement for our paradigm. It's like any reasonable laptop 2010+. My intuition is that other optimizations won't be as impactful as keeping memory usage within RAM (you can check this by running it on 16Gb RAM machine).

You can obviously ignore my suggestion and optimize other things.

Like writing more tests for correctness

Yes.

Once people use it, we can figure out what (if any) problems they are encountering and then optimize for those specific issues.

I wouldn't rely on this. I would be happy if 1,000 people independently use this during 2020 (although pre-made asmap created by a Bitcoin Core maintainer with this program will be shipped with release so that's cool). And of those 1,000, I'd expect maybe 10 reports or something. Most of the people will simply give up if they face any difficulties.

That being said, I want to inflate the gz files in the download command because it is simple "optimization" that for any system will yield a faster runtime.

I'm not sure if that works out, but yeah, sounds good if true.

rrybarczyk commented 4 years ago

I said algorithm, meaning all the logical loops (as opposed to parse method which just converts strings into numbers). The latter may be worth optimizing, the former is too small in the overall run time to bother.

I think there still may be some miscommunication here because AsPathParser is not converting Strings to numbers anywhere, BUT I don't think it really matters in terms of this convo anyways :)

Everything sounds good to me.

naumenkogs commented 4 years ago

Alright, so I was testing the script over all 25 files on this google cloud machine: n1-standard-2 (2 vCPUs, 7.5 GB memory, added 10Gb swap). I think it's reasonable to expect a smooth workflow on this kind of hardware.

After more than 6 hours your script ate all memory so that I couldn't ssh back into the machine. Maybe it would terminate sooner, but since I can't track the progress I gave up. This is the RAM issue I was talking about.

My code, on the other hand, successfully finished after 200 minutes.