Question regarding Top 500 graph

psteinb / meetingcpp2015

repository for slides and code examples for my MeetingCpp talk in 2015

Creative Commons Attribution 4.0 International

12 stars 0 forks source link

Question regarding Top 500 graph #2

Open bchretien opened 8 years ago

bchretien commented 8 years ago

Hi!

First of all, great talk! I've seen it on YouTube there, maybe the link could be added to the README.md.

I had a question concerning your Top 500 graph. I made my own Python script to process the data (I'm not familiar with R). I further distinguished data from accelerators (so GPUs + Xeon Phi etc.) and GPUs (NVIDIA/AMD). I noticed that the data for June 2011 was missing, so I downloaded it and obtained this:

gpgpu_top500

I then decided to make this issue, and test with your own script:

$201x_acc_fraction$

I used the csv file available there.

Was that data for June 2011 ignored on purpose (e.g. something is known to be wrong with the data)? Because there's definitely a drop there. I can make a PR if you're interested. That doesn't change the content of the talk, or your analysis.

psteinb commented 8 years ago

Hi Chretien, thanks you very much for waking this repo from it's social sleep. And thanks as well for pointing me to https://github.com/ww44ss/Exascalar-Analysis-/. I didn't know that this project existed and if I would have, I could have saved a half-day or two downloading the Top500 data by hand :(

In any case, regarding your question, I can only admit that I don't remember exactly. The conclusion for my talk will stay the same, as the upward trend is clearly visible with or without the missing data you commented on. I do recall however, that the data layout of the Top500 changed around 2011 to 2012. Only after that they created the 'Accelerator.Co.Processor' column, that I query in my R plot. You may compare https://github.com/psteinb/meetingcpp2015/blob/master/data/TOP500_201111.csv vs https://github.com/psteinb/meetingcpp2015/blob/master/data/TOP500_201206.csv and grep for accelerator. To be honest, I didn't wanted to dig into this further and concentrate on the talk back then. I'll limit the input data for 2012 to 2015.

Thanks a bunch for pointing that out -

bchretien commented 8 years ago

Starting from 2012 makes sense. My Python script actually looks for Accelerator/Co-Processor and falls back to Accelerator if it wasn't found. As for detecting the use of GPUs, it's a dirty case-insensitive regex looking for NVIDIA/ATI/AMD/K20 etc., since the data has not been normalized.

psteinb commented 8 years ago

Yap - I should have done that as well. If you have your python code public somewhere, I'd be interested to have a look. In any case, please consider to close this issue once you feel like it. ;)