Open bchretien opened 8 years ago
Hi Chretien, thanks you very much for waking this repo from it's social sleep. And thanks as well for pointing me to https://github.com/ww44ss/Exascalar-Analysis-/. I didn't know that this project existed and if I would have, I could have saved a half-day or two downloading the Top500 data by hand :(
In any case, regarding your question, I can only admit that I don't remember exactly. The conclusion for my talk will stay the same, as the upward trend is clearly visible with or without the missing data you commented on. I do recall however, that the data layout of the Top500 changed around 2011 to 2012. Only after that they created the 'Accelerator.Co.Processor' column, that I query in my R plot. You may compare https://github.com/psteinb/meetingcpp2015/blob/master/data/TOP500_201111.csv vs https://github.com/psteinb/meetingcpp2015/blob/master/data/TOP500_201206.csv and grep for accelerator. To be honest, I didn't wanted to dig into this further and concentrate on the talk back then. I'll limit the input data for 2012 to 2015.
Thanks a bunch for pointing that out -
Starting from 2012 makes sense. My Python script actually looks for Accelerator/Co-Processor
and falls back to Accelerator
if it wasn't found. As for detecting the use of GPUs, it's a dirty case-insensitive regex looking for NVIDIA/ATI/AMD/K20 etc., since the data has not been normalized.
Yap - I should have done that as well. If you have your python code public somewhere, I'd be interested to have a look. In any case, please consider to close this issue once you feel like it. ;)
Hi!
First of all, great talk! I've seen it on YouTube there, maybe the link could be added to the
README.md
.I had a question concerning your Top 500 graph. I made my own Python script to process the data (I'm not familiar with R). I further distinguished data from accelerators (so GPUs + Xeon Phi etc.) and GPUs (NVIDIA/AMD). I noticed that the data for June 2011 was missing, so I downloaded it and obtained this:
I then decided to make this issue, and test with your own script:
I used the csv file available there.
Was that data for June 2011 ignored on purpose (e.g. something is known to be wrong with the data)? Because there's definitely a drop there. I can make a PR if you're interested. That doesn't change the content of the talk, or your analysis.