Variables Clarification

liuzhen529 commented 6 years ago

Hey, I and Linda have a discussion today. There are some issues about variables and we really need your help to clarify them.

1) Learning & Education In your proposal, there are four variables about formal/nonformal Education. However, there are five variables in the dataset and two of them are not matched with the proposal.

 a.   In proposal but not in dataset:
       Participation in formal or non-formal education 

  b. In dataset but not in proposal:
      FNFAET12(formal/non-formal Adult education )
      FNFE12JR(formal/non-formal education, job related)

Thank you so much for your help:)

Lindaaaaaa commented 6 years ago

need more detailed description for the following variables

1)Dependent Variabls: Proficiency test score (pvlitM,pvNumM) how is it being measured? Is it a test score of a test?

2)Active learning strategies: act_lrn Active learning strategies (relate new ideas, learning new things, attribute something new, bottoming of difficult things, fit different ideas together, look additional info) Is it the taking the avg of relating new ideas, learning new things and etc? If so, each of this is measured in the scale of 0 -5? like 0 means not at all 5 means strongly agree.

We have no idea what these two columns are. Maybe more information? AGEG5LFS AGEG10LFS
Some clarification about variables related to employment
In the dataset , three columns are related employment

Employed (only one category, which is 1,i.e all ones for this column in the dataset) what does this represent?
Full_part (1,2)
self_employed (4 categories - 1,2,6,7) what does these levels represent ?

In the proposal, you listed these two -Employment status 2 (1-fulltime, 2-contract, 3-daily contract)[] -Employment type (1-fulltime, 2-part-time, 3-not working, 4-student, 5-intern, 6-retired, etc.)

We have some issues relate the employment variables in the dataset to your proposal. Could you please clarify it?

Thanks for your help:)

hyunokryu commented 6 years ago

Please base your analysis on the dataset. Use the dataset (b).
1) I believe pvlitM is the mean value of proficiency test scores for literacy and pvnumM is the mean of proficiency test scores for numeracy.

2) I blieve so.

Please ignore those two.
Full_part:
1: full time 2: part time Please ignore “self employed” variable.

Let me know if you have more questions. Thank you for working on this :)

liuzhen529 commented 6 years ago

Thank you so much for your reply. It really helps.

There is another issue. You are interested in the factors that can affect skills and proficiency, which means you want to find which variables can result in better skill/proficiency. There is a variable called 'Mgr'. It states if one is manager or not. For my understanding, usually one becomes a manager because he/she has better skill. (better skill-->manager). The logical direction seems to be the reverse(we want: xxx --> better skill).

I was wondering if we still need to check if 'Mgr' is a significant factor. Alternatively, if necessary, we could investigate that what factors will affect 'being a manager'.

Best, Zhen

hyunokryu commented 6 years ago

Thanks, I really appreciate the fact that you engage into this project. You are trying to put logic in it. I like that :)

Here are my opinions:

Yes, we usually think better skills—>manager. But from my understanding, “numerical/literacy skill use at work” does not necessarily mean the level of skill. It is more about skill utilization.

So, if it is not too much, why don’t we put that variable into the analysis and see how it comes out.

Also, for your information, if it is “no” in “Mgr” then the value in “Mgr_c” would be “0” Just so that you decide how you put those two or choose one into the analysis.

Thanks again :)

Hyunok

On Tue, 30 Jan 2018 at 8:54 PM Zhen Liu notifications@github.com wrote:

Thank you so much for your reply. It really helps.

There is another issue. You are interested in the factors that can affect skills and proficiency, which means you want to find which variables can result in better skill/proficiency. There is a variable called 'Mgr'. It states if one is manager or not. For my understanding, usually one becomes a manager because he/she has better skill. (better skill-->manager). The logical direction seems to be the reverse(we want: xxx --> better skill).

I was wondering if we still need to check if 'Mgr' is a significant factor. Alternatively, if necessary, we could investigate that what factors will affect 'being a manager'.

Best, Zhen

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-361823631, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDPG6AcgW35BjaNmb9d1_xWrxa26Iks5tP_INgaJpZM4Ry2y0 .

NSKrstic commented 6 years ago

Thanks Hyunok for some extra clarification. And good questions Zhen and Linda.

In addition STAT450, note that we've included a codebook in the "Resources" folder, downloaded from the OECD website. It describes some of the variables, but not all. Feel free to check there and the websites below. However, I believe some of the variables, like skill use and proficiency scores, do not appear in the codebook.

On that note, Hyunok, I noticed that the skill use variables ("lit_use" and "num_use") are not strictly natural numbers (1, 2, 3, 4 and 5) but can take values in between (like 4.3333, etc.). Pretty often the decimal is a simple fraction (in the example given, it's 1/3), but I want to know why that is the case? Are these composite measures of multiple variables? Are they the ones listed below each "likert scale" variable in the metadata?

Useful Websites: http://www.oecd.org/skills/piaac/ http://www.oecd.org/skills/piaac/publicdataandanalysis/#d.en.408927

hyunokryu commented 6 years ago

Hi, thanks for your further question. But I am not very sure if I am looking at the same thing as what you address to. Could you send me the screenshots of the cases and the names of the variables for “but can take values in between (like 4.3333, etc.)“ Thanks.

Hyunok

On Tue, 30 Jan 2018 at 10:30 PM NSKrstic notifications@github.com wrote:

Thanks Hyunok for some extra clarification. And good questions Zhen and Linda.

In addition STAT450, note that we've included a codebook in the "Resources" folder, downloaded from the OECD website. It describes some of the variables, but not all. Feel free to check there and the websites below. However, I believe some of the variables, like skill use and proficiency scores, do not appear in the codebook.

On that note, Hyunok, I noticed that the skill use variables ("lit_use" and "num_use") are not strictly natural numbers (1, 2, 3, 4 and 5) but can take values in between (like 4.3333, etc.). Pretty often the decimal is a simple fraction (in the example given, it's 1/3), but I want to know why that is the case? Are these composite measures of multiple variables? Are they the ones listed below each "likert scale" variable in the metadata?

Useful Websites: http://www.oecd.org/skills/piaac/ http://www.oecd.org/skills/piaac/publicdataandanalysis/#d.en.408927

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-361837254, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDHxsSfqgdXOQoe3vfcRyw38PawCIks5tQAh4gaJpZM4Ry2y0 .

liuzhen529 commented 6 years ago

Hi, Nikolas and Hyunok,

I and Linda checked the data description(assorted variables-labels & scales.txt). We discussed it today. We thought that for lit_use, it is measured by taking an average of 18 related components(e.g., Read letters memos or mails, Read directions or instructions, Read books.... ). For each component, it used likert scale(1,2,3,4,5).

I think this is why there is 4.333; for example, 78/18 = 4.333.

Zhen

NSKrstic commented 6 years ago

You're likely correct, and that was my first thought as well. But I just wanted to confirm with Hyunok since it's not clear from the metadata whether this is strictly an average of those components or some other methods were employed (weighted averaging, etc.).

Nikolas

tom-hc-park commented 6 years ago

Hi,

@hyunokryu This is the variables that we concern.

hyunokryu commented 6 years ago

Yeap, they are continuous values as they are averaged out from multiple sub-items. Thanks!

On Thu, 1 Feb 2018 at 10:41 AM aiod01 notifications@github.com wrote:

Hi,

[image: screen shot 2018-02-01 at 10 39 14 am] https://user-images.githubusercontent.com/31666132/35696465-6de0c06e-073c-11e8-8684-e721a0cbfd1b.png @hyunokryu https://github.com/hyunokryu This is the variables that we concern.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-362361873, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDCi6Qe4tcLrdPfgAFpX0WdUDsrnTks5tQgVlgaJpZM4Ry2y0 .

tom-hc-park commented 6 years ago

@liuzhen529

About the variable 'Mgr', my opinion is this: we have 4 variables, • Skill use for work: Numeracy :num_use Literacy :lit_use • Proficiency test scores: Numeracy: pvnumM Literacy: pvlitM

If we set 'skill use' as the response variable, then we may set Mgr as covariate because the dependent is about how many times do you use the skill.

However, if we use proficiency test scores as the response, then we may not set Mgr as independent variable because the response measures how good skill you have.

liuzhen529 commented 6 years ago

@hyunokryu Hi hyunok, We have discussed the project today and we found a new issue and we might need your clarification.

About the variable 'pub_priv', 1062 of them are from the private sector and 145 are from the public sector. In your proposal, it is said that only focus on the private sector. We wonder if we need to just focus on private sector or we can treat pub/priv as a covariate.

Thanks for your help:)

Zhen

hyunokryu commented 6 years ago

Hi, could you clarify what you mean by using the variable as a covariate? Thanks!

On Thu, 1 Feb 2018 at 4:32 PM Zhen Liu notifications@github.com wrote:

@hyunokryu https://github.com/hyunokryu Hi hyunok, We have discussed the project today and we found a new issue and we might need your clarification.

About the variable 'pub_priv', 1062 of them are from the private sector and 145 are from the public sector. In your proposal, it is said that only focus on the private sector. We wonder if we need to just focus on private sector or we can treat pub/priv as a covariate.

Thanks for your help:)

Zhen

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-362448075, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDEkUprqeiFmM6fLZNiAIsFuA15Bhks5tQlekgaJpZM4Ry2y0 .

liuzhen529 commented 6 years ago

@hyunokryu Hi,
Covariate variable is independent variable.

Zhen

hyunokryu commented 6 years ago

Yes, please treat them as an independent variable and see how they are. If possible, could you do two seperate analysis to see 1) both pub and priv as an independent variable 2) only priv as an independent. Thank you. Public” has a small number of cases, I wonder how that affects the analysis.

On Thu, 1 Feb 2018 at 6:10 PM Zhen Liu notifications@github.com wrote:

@hyunokryu https://github.com/hyunokryu Hi, Covariate variable is independent variable.

Zhen

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-362464083, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDOKp3ZD-cmj-HsvosJMfNHvnTlp3ks5tQm6ggaJpZM4Ry2y0 .

gcohenfr commented 6 years ago

great discussion going on! Thanks @hyunokryu for participating closely! I am also concern about the unbalancedness of a possible covariate "sector" (private vs public) in the model. These two sectors can be very different and this covariate may be able to capture that effect. However, if your main interest is the private sector, it may be better to ignore the public group in the analysis.

@hyunokryu is there any reason that explains the difference in sizes between the private and the public sets?

hyunokryu commented 6 years ago

Thank you for your question, Gabriela.

To be honest, I have no clear idea: this is a secondary data set, which is collected by OECD. If I take a close look into their data description and etc, I would know but I don’t think it would be necessary at the point.

Dear the team, we do not have to strictly stick to the proposal. If there is a way to discover interesting findings, you are more than welcome to be experimental!

So, Gabriel, would the unbalancedness of the variable (priv and public) un-justify it to be put as a covariate? Then should it be used only as a group comparison, or something like that? Thank you in advance.

Best regards, Hyunok

On Mon, 5 Feb 2018 at 9:49 AM Gabriela Cohen Freue notifications@github.com wrote:

great discussion going on! Thanks @hyunokryu https://github.com/hyunokryu for participating closely! I am also concern about the unbalancedness of a possible covariate "sector" (private vs public) in the model. These two sectors can be very different and this covariate may be able to capture that effect. However, if your main interest is the private sector, it may be better to ignore the public group in the analysis.

@hyunokryu https://github.com/hyunokryu is there any reason that explains the difference in sizes between the private and the public sets?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/issues/4#issuecomment-363163803, or mute the thread https://github.com/notifications/unsubscribe-auth/AiKgDJ_KhjrDwBlzMMarsKeXN04gEs4rks5tRz8ugaJpZM4Ry2y0 .

gcohenfr commented 6 years ago

@hyunokryu , adding a variable "sector" in the model is equivalent to making a group comparison: "private vs public". The main difference when using a regression model is that the comparison is controlled by the other variables in the model. In other words, the difference "private vs public" may be explained by other variables in the model, for example, differences in "employment type". Thus, the difference between private and public sectors may not be significant once you control for other variables.

The unbalancedness may affect the significance of the analysis since a much larger weight is given to the private sector. You may not be able to detect a true difference if you don't observe enough public cases. We just need to be careful when concluding.

Hope this clarifies my previous point

tom-hc-park / STAT550-450-for-Seniorworkers-from-Korea

Variables Clarification #4