pthane / QP-Data

0 stars 0 forks source link

Digging Deeper #9

Open pthane opened 3 years ago

pthane commented 3 years ago

Hi @jvcasillas ,

I ran the revised models with the code for the subordinate verb, and I got no results other than DELE for production. Jennifer and Silvia want me to "dig deeper" and to try to find effects within proficiency groups or age bands. They also said that I can look at my descriptive statistics to try to make conclusions. Jennifer told me she thought that I could dig deeper. I must say that I'm pretty discouraged that about 200 hours of data collection has come down to trying to manipulate data to get a significant result. I suppose that trying to find consistencies across similar speakers makes sense (i.e., only looking at intermediate participants), but then again, I'm not sure it's really what I believe in. I ran the stats for the intermediate HS and there were still no main effects.

It could be that there is legitimately nothing to see. In one way, it makes me feel that by doing the transparent thing with continuous variables, I may have cooked my own goose, which is understandable. However, I'm sort of at a loss of how to move forward and how to write a conclusion based upon descriptive statistics, especially when there aren't clear trends across the data. Do you have any tips for how to proceed?

I'm more or less "over it" at this point. I'm sure you can sympathize, but I honestly would like to get this process behind me at this point because trying to "hacerlo salir" isn't really sitting that well with me I suppose.

Thanks for your insights as always.

jvcasillas commented 3 years ago

Hi @jvcasillas ,

I ran the revised models with the code for the subordinate verb, and I got no results other than DELE for production. Jennifer and Silvia want me to "dig deeper" and to try to find effects within proficiency groups or age bands. They also said that I can look at my descriptive statistics to try to make conclusions. Jennifer told me she thought that I could dig deeper. I must say that I'm pretty discouraged that about 200 hours of data collection has come down to trying to manipulate data to get a significant result. I suppose that trying to find consistencies across similar speakers makes sense (i.e., only looking at intermediate participants), but then again, I'm not sure it's really what I believe in. I ran the stats for the intermediate HS and there were still no main effects.

Patrick, I'm sorry. I know how frustrating this can be. Especially when you are being asked to do something you might not be comfortable with.

It could be that there is legitimately nothing to see. In one way, it makes me feel that by doing the transparent thing with continuous variables, I may have cooked my own goose, which is understandable. However, I'm sort of at a loss of how to move forward and how to write a conclusion based upon descriptive statistics, especially when there aren't clear trends across the data. Do you have any tips for how to proceed?

I have several suggestions. If you want, I can double check your code/models to make sure there aren't any silly mistakes. I have low confidence in that revealing anything new, but it is always good to have another pair of eyes on your code. Assuming that does not help out, the next course of action for you would be to dichotomize the continuous variables (ew). You could, for example take the lower 33rd percentile and the upper 66th percentile and refit models with categorical factors. I think you know that I am not a fan of this idea, but you could do it with the understanding that you are trying to keep your committee happy. Looking past the QP, you are certainly not out of options. You can look at age, if you want. We can also talk about other types of analyses that would allow you to show evidence for the NULL. This would possibly be a bit risky, in a sense, because in essence you would be showing evidence against the AH hypothesis and who knows how certain people would take that. We can talk about that further down the line.

I'm more or less "over it" at this point. I'm sure you can sympathize, but I honestly would like to get this process behind me at this point because trying to "hacerlo salir" isn't really sitting that well with me I suppose.

Thanks for your insights as always.

Again, I totally get it. Remember, the QP is an exercise. You have already exceeded my expectations in terms of showing that you "get it" and that you are capable of doing research. Don't think you are out of luck in terms of publication options.

pthane commented 3 years ago

Patrick, I'm sorry. I know how frustrating this can be. Especially when you are being asked to do something you might not be comfortable with.

It means a lot to hear you say that, even though I know that this is something that I have learned from you over the last few years. The thing that is troubling to me now is (A) that it seems to me that I may have fundamentally different perspectives about the enterprise of research than some of the people I look up to and (B) that I am unsure of exactly where to go in dissertation planning at this point. If frequency effects aren't all that important, my idea of looking for differences in frequency effects between monolingually-educated HS and bilingually-educated HS is not as tenable. I need to really sit back and think about what it is that I am going to look for and whether this is a relevant line of work moving forward.

That said, I am going to present my aspect data at the upcoming brown bag on Wednesday. These data are exactly what I predicted for the subjunctive QP, and they support lexical frequency and frequency of use, while also actively not supporting the influence of age. I am very much a fan of these data and hope to publish them. Jen and Silvia think it is too late to include these data instead of the subjunctive, which I understand, because it implies another round of revisions that they do not feel are feasible. They have been very supportive, and I will say that I sent them quite a discouraged email a few days ago, so I hope that wasn't a step too far.

I have several suggestions. If you want, I can double check your code/models to make sure there aren't any silly mistakes. I have low confidence in that revealing anything new, but it is always good to have another pair of eyes on your code. Assuming that does not help out, the next course of action for you would be to dichotomize the continuous variables (ew). You could, for example take the lower 33rd percentile and the upper 66th percentile and refit models with categorical factors. I think you know that I am not a fan of this idea, but you could do it with the understanding that you are trying to keep your committee happy. Looking past the QP, you are certainly not out of options. You can look at age, if you want. We can also talk about other types of analyses that would allow you to show evidence for the NULL. This would possibly be a bit risky, in a sense, because in essence you would be showing evidence against the AH hypothesis and who knows how certain people would take that. We can talk about that further down the line.

That's a very interesting option, but thinking pragmatically, I do not think I want to publish the subjunctive data. If including age of acquisition of English was not all that well received before I finished the QP, it's not going to be well received down the line, as you say. As we discussed, age is a factor whose importance is difficult to interpret. I personally think it has something to do with input prominence, which I proposed in the manuscript, because if you are a sequential bilingual, your patterns of competing sources of input are different across the lifespan than sequential bilinguals. 100%. However, it's very difficult to test in my opinion.

I will certainly update this repo and upload the latest version. I may wind up creating a new repo because I went ahead and reorganized (tidied up) the RProj on my desktop and consolidated some of the scripts (again, a good thing from this QP was that I became much more proficient in coding). Once I make sure it is how I think I want it, I'll let you know and I would appreciate a second look just in case (but I don't think it matters).

Many thanks!

Again, I totally get it. Remember, the QP is an exercise. You have already exceeded my expectations in terms of showing that you "get it" and that you are capable of doing research. Don't think you are out of luck in terms of publication options.

That means a lot to me. Yes, I've certainly worked hard on this and I have paid very close attention to detail. I do think that I "get it" even though there is plenty more for me to learn and to improve upon. One thing that I think I did well (and Jen thinks so too) was task design. Each of you gave me really valuable and honest feedback, and I had minimal to no changes in the design section, so if the experiment elicited what I wanted to elicit, then I have given it my best effort and there is a point where you just have to say "after I talk about this for 20 minutes with my committee, I can move on with my life, and my life >>>>> subjunctive every day of the week."

Thanks, Joseph; you're a great support.

pthane commented 3 years ago

Hi @jvcasillas ,

I have cleaned up the repo on GitHub such that there are now only the useful and up-to-date files. I have consolidated many scripts into a smaller number so that I can easily work with each one. With that being said, I have uploaded the code for the models, but I am running into two problems that I simply cannot figure out (I've been at it since 8 AM):

  1. I cannot get my L2 models to run because it says that "Group_No" is an invalid grouping factor. This is the equivalent of "participant" for my random effect. It works for my HS participants without problem, and I went back and manually entered each of the numbers into the data sheets, without any luck. I have no idea as to why it is not accepting "Group_No" in the L2 models when there was no issue with the HS models.

If` this requires retracing my steps, this is my work flow: I enter the data into the Master Coding Document. I copy the EPT data into the EPT Master.csv file within "CSV Files" and the FCT data into the FCT Master.CSV in the same location. The Standardize and Prepare Data script then tidies the data and assigns the new CSV files into the appropriate location, at which point it is then possible to run the stat models (or at least attempt to do so).

I've been looking at this all day now, so I'm pretty stumped on this one.

  1. I am also getting an error on line 16 of the lexical item analysis script. I see that in line 16, I have a line of code that reduces my data from 32 columns to 1 and I'm not sure why. We built this script together (well, you did it…) but I'm not sure if that is something that can be fixed easily. It would be awesome to know if you have any suggestions because I am not able to run the summarize function (or any of the subsequent functions) because the averages aren't populating into the table that this code is supposed to generate.

Many thanks!

jvcasillas commented 3 years ago

I have some free time. Can you jump on zoom?

jvcasillas commented 3 years ago

https://rutgers.zoom.us/my/jvc83?pwd=Tk8rU2dMUEVkTm9DM3pkQUsvVlA2QT09

jvcasillas commented 3 years ago

github just gave me a build error for what I just pushed. So I will have to take a look and push it again

pthane commented 3 years ago

No prob. Thanks!

PT

Patrick D. Thane, M.Ed. He/him/his Ph.D. student, Bilingualism/Second Language Acquisition Undergraduate Teaching Assistant of Spanish Researcher, Bilingualism and Second Language Acquisition Lab Department of Spanish and Portuguese Academic Building West (15 Seminary Place), Office 5184 College Avenue Campus, New Brunswick

[cid:FFA3B0F4-6FD6-414C-AFAD-2FF6A51DF69B]

On Mar 23, 2021, at 5:03 PM, Joseph V. Casillas @.**@.>> wrote:

github just gave me a build error for what I just pushed. So I will have to take a look and push it again

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpthane%2FQP-Data%2Fissues%2F9%23issuecomment-805257879&data=04%7C01%7Cpthane%40spanport.rutgers.edu%7C47e361debf544e99902308d8ee3f0ef0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637521301888460581%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Mnq3oijFCsc5jsLZOxjMs58nBcUjjvzvcUOBx7V%2BP%2FE%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAONNBBZVN2R63D73XMS2QRDTFD6YVANCNFSM4ZRTIFSA&data=04%7C01%7Cpthane%40spanport.rutgers.edu%7C47e361debf544e99902308d8ee3f0ef0%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637521301888470538%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BEnZKeKfx%2FbORYPCdKW66279mXGHvUBMQ2lcJQfSTX4%3D&reserved=0.

pthane commented 3 years ago

Hola @jvcasillas ,

So I have managed to "fix" the L2 models and the interaction between frequency of use and token frequency of the subordinate verb was significant in the omnibus model, but not in the task-specific models. Could you explain this to me? I understand what it means on the surface (that the data together show this effect, but not on a task-by-task basis), but I guess I don't quite understand how the whole isn't the sum of its parts, so to speak. Furthermore, would you suggest only reporting the data from the one model then? In that regard I don't have to go into the whole thing about task-by-task (and I can "hide" that there weren't task-specific effects, which again, at this point is fine. I want this done).

Furthermore, I wasn't able to get the revised code for the lexical item analysis to run. It had problems with the weird atar verb with the crazy numbers.

Lastly (and you may want to kill me for this), I may want to look at the participants' self-ratings instead of the Davies frequencies for lexical frequency. I think I can do this on my own, but not in an automated way (which will also send chills down your spine). As time is of the essence, I think that might be best, but then again, I'm not sure this is fully theoretically-motivated considering different populations may report different degrees of lexical use, while the Davies frequencies remain constant across groups.

pthane commented 3 years ago

Also, thank you so much for your helpful comments today. I was feeling pretty down, but to have you, David, and Silvia give me such constructive and helpful suggestions made me feel a lot more confident. The aspect data reflect where I want to be at this point, even if the subjunctive data don't. Life happens, but I feel well-supported and after today I feel that I'm back on the right track.