jueyang / call-me-maybe

Use the issue queue. Dark secrets welcome. (CUNY-J teaching 2015)
3 stars 0 forks source link

Paring down my data #31

Closed robertsanna closed 9 years ago

robertsanna commented 9 years ago

Hi, Jue! I've started trying to create this chart, and I have a data set that I need. The issue is that I'm having trouble properly structuring it into a Pivot Table. I want to take the ELA test scores from students from the Bronx, grade 6-8. I want to make sure that I can isolate the values of the scores, Levels 1-4, in the sheet labeled BouroughELAResults. I am close, but my Pivot Table keeps looking strange, and I'm not sure how I should be setting up the table.

Also, since github doesn't support csv files or xls files, I'm not sure how I'm supposed to upload this to share it with you. Please advise.

Thanks!

jueyang commented 9 years ago

Hey @robertsanna,

Glad you find your way to post your question!

To share your csv, go to https://gist.github.com/ and paste your csv there. Name it ela_score.csv and click Create secret gist. Then copy the url and paste it in your comment here.

The gist interface looks like the following (don't confuse the description and filename slots!)

Without seeing the dataset, it's hard for me visualize what values you are trying to aggregate/count in a pivot table (or if you should use a pivot table at all). Once I take a look at the dataset I'll be able to say more, since I know your intention :)

I want to take the ELA test scores from students from the Bronx, grade 6-8. I want to make sure that I can isolate the values of the scores, Levels 1-4, in the sheet labeled BouroughELAResults.

robertsanna commented 9 years ago

Hi, Jue! Thanks very much. Here's the URL. To my data set.

https://gist.github.com/robertsanna/809ac6ca5eb55dda949d

I am envisioning some type of chart where you can look at grade level, and corresponding scores by year. So the data has 2013, 2014, and 2015 scores for grades 6, 7, and 8. And then I would like to sort the data by the percentage of students that got 1's, 2's, 3's, 4's and 5's in each grade. Is that too many variables? I thought that maybe I could do a bubble chart using Plotly to do so. But I am not sure how to organize this information. Thanks for your help!

-Anna

jueyang commented 9 years ago

Hey @robertsanna,

Apologies for the delayed reply! I looked into your data and experimented with plotly. If you are still looking for some guidance, here are a few things you might want to consider:

  1. You file is actually a TSV since it's delimited by tabs instead of commas. Rename it as ela_score.tsv to remain schematically correct. (This probably affect you right away, but in the future you might be wondering where you've introduced an error in your process. Clean data is always step one.)
  2. To only show the scores for 13/14/15, and only for grades 6/7/8. Try googling selecting value based on cell value excel. Hint: usually go with the links by stackoverflow (a site where people post programming-related questions, from specific commands to abstract concepts, and creating a pivot table/writing an Excel Query is programming, believe it or not.)
  3. When you say "sort" by percentage -- do you mean to sort by the 1's, 2's, etc. respectively? If so you will be looking to sort 5 times... If you can tell me about your rationale of using the bubble chart, and also, your goal of this graph (like what is the name of the figure once you have it), it'll help me understand what it is that you need.

Let me know if this helps and if you have further or new questions.

(If you feel I'm creating more hoops for you to jump through, it's because navigating in the data space IS about getting over one barrier after another! And muscle memories come from your own practice :)