idocs / test1

Apache License 2.0
122 stars 15 forks source link

Several questions on section 2.1 (Data wrangling - Tidy data) in the Github tutorial #2

Open parul8ue opened 9 years ago

parul8ue commented 9 years ago

2.1 Tidy Data:

-The first question on this page shows a 3 x 2 matrix, and asks how many variables are there. It explains there are 3, and that they are ?injured, count, and gender. How is that? What are the rows here?

-The religious data set has no solution.

-The TB data set explains that there "Seems to be" lurking variables of gender and age. Where is this answer coming from?

-Similar issue with the weather set problem.

In general, there are several gaps in the written tutorial, perhaps they were covered verbally in the Pycon 2014 workshop. Is there a video from there that one can refer?

ramnathv commented 9 years ago

You are right that there is missing information in the tutorial. Let me clarify

  1. The row labels are missing here. They are injured and uninjured. Gender is a variable that is present as columns, bringing the total number of variables to 3.
  2. The basic premises of tidy data are presented in the introduction. For the religion dataset, some of the column header represent income, which is actually a variable. Hence this dataset violates the definition of tidy.
  3. It is hard to guess this one without some context. The hint lies in the variable names new_sp_m014, where the m represents males and 014 represents an age group. In the actual tutorial, this hint was given to the participants.
parul8ue commented 9 years ago

Thanks for the clarifications. Could you please update these in the tutorial? Will save your readers a lot of confusion. Thanks!

SlideRule http://mysliderule.com/: Learn something new

On Wed, Nov 4, 2015 at 9:53 AM, Ramnath Vaidyanathan < notifications@github.com> wrote:

You are right that there is missing information in the tutorial. Let me clarify

  1. The row labels are missing here. They are injured and uninjured. Gender is a variable that is present as columns, bringing the total number of variables to 3.
  2. The basic premises of tidy data are presented in the introduction. For the religion dataset, some of the column header represent income, which is actually a variable. Hence this dataset violates the definition of tidy.
  3. It is hard to guess this one without some context. The hint lies in the variable names new_sp_m014, where the m represents males and 014 represents an age group. In the actual tutorial, this hint was given to the participants.

— Reply to this email directly or view it on GitHub https://github.com/idocs/test1/issues/2#issuecomment-153567602.

parul8ue commented 8 years ago

Hi Dr. Ramnath,

Another issue - this page display correctly on Windows. The scroll bar doesn't work, so windows users can't see the full page. Can you please check?

On Tue, Nov 3, 2015 at 8:23 PM, Ramnath Vaidyanathan < notifications@github.com> wrote:

You are right that there is missing information in the tutorial. Let me clarify

  1. The row labels are missing here. They are injured and uninjured. Gender is a variable that is present as columns, bringing the total number of variables to 3.
  2. The basic premises of tidy data are presented in the introduction. For the religion dataset, some of the column header represent income, which is actually a variable. Hence this dataset violates the definition of tidy.
  3. It is hard to guess this one without some context. The hint lies in the variable names new_sp_m014, where the m represents males and 014 represents an age group. In the actual tutorial, this hint was given to the participants.

— Reply to this email directly or view it on GitHub https://github.com/idocs/test1/issues/2#issuecomment-153567602.