jbosch-noaa / Metrics_work

Development of python code to analyze various "metrics" related to the U.S. IOOS program
0 stars 0 forks source link

Pandas and numpy question #1

Open jbosch-noaa opened 7 years ago

jbosch-noaa commented 7 years ago

@ocefpaf

If you use pandas to read in am excel file to a data frame, do you need to pull select data out of that data frame into a numpy array before you can do any math on it?

I have a spread sheet of weather and oceanographic data message counts to the Global Telecommunications System (GTS). I am looking to take the totals of the oceanographic messages and create a pie graph of the % contribution of different IOOS and non-IOOS messages to the total of ALL GTS messages. This means I need to take a subsection of the spread sheet (the different sources of oceanographic messages) and calculate the % of the total spread sheet. Then I need to make a pie chart of those percentages.

I can do this in excel. I can do this in Matlab. I am trying to teach myself how to do it in Python. Check out the comment between lines 6 and 7 of NDBC_GTS_Metrics.ipynb :

Calculate the IOOS asset percentage of the total gts messages

ocefpaf commented 7 years ago

do you need to pull select data out of that data frame into a numpy array before you can do any math on it?

That is not necessary. However, you must always check the df.dtypes to be sure pandas got the data type "correctly."

I am looking to take the totals of the oceanographic messages and create a pie graph of the % contribution of different IOOS and non-IOOS messages to the total of ALL GTS messages.

Here is a small example to help get you started:

http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/ocefpaf/4fccc7478241438c05a2e363c3e50178/raw/b12df092de06d781803af931ae56ef02fab7778c/NDBC_GTS_Metrics_2016.ipynb

Check out the comment between lines 6 and 7 of NDBC_GTS_Metrics.ipynb :

Calculate the IOOS asset percentage of the total gts messages

I created my example before looking into yours. Note that your way to do it is fine but I prefer to re-compute the sum to avoid importing any Excel "errors."

PS: I edit my example to do the same graph as yours but computing everything from in the notebook. Look at the last cell.