Closed jvcasillas closed 5 years ago
Yeah no problem, just added the file.
Ok. I am going to respond to some of your comments as well as the pdf file.
In previous emails you mentioned the following:
I spoke to Robin yesterday, and she encouraged me to model with tone as the independent variable, constructing a model for the interaction of tone with each acoustic measurement. Her suggestion was that this is more in line with the question I'm asking in the first phase of the project, which just looks to confirm the presence of creak in Yoruba and see if there's a significant difference from tone level to tone level with regards to acoustic properties that are known to indicate creaky voice. This way I get a notion of which correlates of creak are "active" so to speak, which will make the second phase of the project easier. So I'm back to good ol' lm().
I agree that modeling a given acoustic parameter as a function of tone is the best way to answer the question of whether or not there is creak in Yoruba. However recall that you mentioned the following in a previous email:
I want to know how various acoustic measures might tell you what tone level you're on.
If this is your question, then the modeling strategy you suggest is not appropriate. It appears that this is no longer your question, but I wanted to make this clear.
I have made comments in your script that include questions and suggestions. I have not changed any of your code (update: except for two lines at the beginning... I noticed this after I typed this comment) though the file is now longer because of my comments. You will see that they are all prefaced with # COMMENT JVC
.
My main concerns at this point has to do with your use of lm
with repeated measures data. Specifically, you have item repetitions (5 if I recall correctly) coming from a single speaker and your models are not aware of this. Essentially you are creating a anti-conservative models that inflate degrees of freedom. You have two options (well, two simple options). 1) you can calculate the avg over items reps for each DV and refit your models, or 2) you can use mixed effects models that take into account the item repetitions in the random effects structure. Option 1 is easier. Option 2 is (arguably) better, and, at a minimum, more in line with common practices in phonetic data analysis.
I also noticed you have fit some models in which the predictors are subsets (time slices, I believe) of the criterion. You can't do this and expect to get reliable parameter estimates because multicollinearity issues. I'm not sure if these are exploratory models, but it is important that you know you can't trust them.
A few more general comments... I included an rstudio .rproj file. I believe it will make it easer for you to work with your script (it automatically sets the working directory as the project root). As you continue developing your analyses I suggest you split your .R files up (one for tidying your data, one for plots, another for models, etc.) as this one is rather long and will become more difficult to work with as it gets longer.
(I didn't notice anything else here)
Thanks for the comments. Some questions/thoughts:
So sounds like I want to move towards lmer
then, right? Since I'm hoping to add more speakers anyway that seems like a reasonable move to make. How do you suggest I take the repetition into account? For example with only one participant if I try to run:
lmer(avg_spec ~ target_tone + (1|subj), data = datana)
It tells me:
Error: grouping factors must have > 1 sampled level
Should the random effects include the block (repetition) number? (1|block)
That gives me this:
Linear mixed model fit by REML ['lmerMod']
Formula: avg_spec ~ target_tone + (1 | block)
Data: datana
REML criterion at convergence: 1435.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.3384 -0.5251 0.0542 0.5398 3.5907
Random effects:
Groups Name Variance Std.Dev.
block (Intercept) 0.07917 0.2814
Residual 11.73148 3.4251
Number of obs: 271, groups: block, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.1866 0.4464 -0.418
target_toneH 8.0896 0.5453 14.835
target_toneM 4.9790 0.5443 9.147
Correlation of Fixed Effects:
(Intr) trgt_H
target_tonH -0.754
target_tonM -0.755 0.618
I'm a little fuzzy on how to interpret lmer
results but I can read up on it.
I also noticed you have fit some models in which the predictors are subsets (time slices, I believe) of the criterion. You can't do this and expect to get reliable parameter estimates because multicollinearity issues. I'm not sure if these are exploratory models, but it is important that you know you can't trust them.
Yes, this was the wrong way to go about it. I see now what those models essentially say is "if I give you all the information about f0 (for example) at different points in the vowel, can you tell me something about f0 in general?" Not very interesting! Also explains getting R-squared of 1...
What I was hoping to do was find out if the dependent vars show any significant difference between vowel portions, because it's been noted that in languages that employ non-modal phonation, sometimes it's contained to a certain part of the vowel. So it might be the case that the first half of the vowel is creaky, but the second is less so.
I think the right way to do this is to adjust the data structure so that vowel portion is a variable in its own column next to the value for that observation of that variable. One way I found to do this is melt
(I think this is what gather
does too?)
mspec <- melt(datana,id.vars = 'target_tone', measure.vars = c('specTilt_1','specTilt_2','specTilt_3','specTilt_4'))
With spectral tilt as an example. Then I get a data frame like this:
> head(mspec)
target_tone variable value
1 M specTilt_1 2.983939
2 M specTilt_1 4.642098
3 L specTilt_1 NA
4 L specTilt_1 8.411694
5 H specTilt_1 5.555028
6 M specTilt_1 2.855002
And I can have the "variable" i.e. portions of the vowel as independent variable and "value" as dependent variable. But maybe split it up by tone category to make it easier to interpret/see differences. And stick with mixed effects because now I have 4 observations from the same vowel.
Lastly, thanks for the .rproj file and general housekeeping comments. I think you're right, that will ultimately make things a lot easier to work with. I'll switch over to a setup like that when I get a chance.
Okay, that's it for now. Hope to hear from you soon.
Thanks for the comments. Some questions/thoughts:
Modeling
So sounds like I want to move towards
lmer
then, right? Since I'm hoping to add more speakers anyway that seems like a reasonable move to make. How do you suggest I take the repetition into account? For example with only one participant if I try to run:
Yes. lmer
is what you are looking for. You can go ahead and write the code in preparation for when you have more speakers.
lmer(avg_spec ~ target_tone + (1|subj), data = datana)
It tells me:
Error: grouping factors must have > 1 sampled level
You cannot add a random intercept for each participant if there is only one. The code you wrote above is what you will use once you have collected data from more speakers.
Should the random effects include the block (repetition) number?
(1|block)
Yes.
That gives me this:
Linear mixed model fit by REML ['lmerMod'] Formula: avg_spec ~ target_tone + (1 | block) Data: datana REML criterion at convergence: 1435.1 Scaled residuals: Min 1Q Median 3Q Max -3.3384 -0.5251 0.0542 0.5398 3.5907 Random effects: Groups Name Variance Std.Dev. block (Intercept) 0.07917 0.2814 Residual 11.73148 3.4251 Number of obs: 271, groups: block, 5 Fixed effects: Estimate Std. Error t value (Intercept) -0.1866 0.4464 -0.418 target_toneH 8.0896 0.5453 14.835 target_toneM 4.9790 0.5443 9.147 Correlation of Fixed Effects: (Intr) trgt_H target_tonH -0.754 target_tonM -0.755 0.618
I'm a little fuzzy on how to interpret
lmer
results but I can read up on it.
This is the beauty of learning the linear model. It's the same as what you already know. In this case you focus on the fixed effects. Tone L is your intercept. A change from L to H equate to an increase of 8.09 avg_spec. By default lmer
doesn't give you p-values. You can get the by loading the package lmerTest
right after you load lme4
.
I also noticed you have fit some models in which the predictors are subsets (time slices, I believe) of the criterion. You can't do this and expect to get reliable parameter estimates because multicollinearity issues. I'm not sure if these are exploratory models, but it is important that you know you can't trust them.
Yes, this was the wrong way to go about it. I see now what those models essentially say is "if I give you all the information about f0 (for example) at different points in the vowel, can you tell me something about f0 in general?" Not very interesting! Also explains getting R-squared of 1...
What I was hoping to do was find out if the dependent vars show any significant difference between vowel portions, because it's been noted that in languages that employ non-modal phonation, sometimes it's contained to a certain part of the vowel. So it might be the case that the first half of the vowel is creaky, but the second is less so.
I think the right way to do this is to adjust the data structure so that vowel portion is a variable in its own column next to the value for that observation of that variable. One way I found to do this is
melt
(I think this is whatgather
does too?)mspec <- melt(datana,id.vars = 'target_tone', measure.vars = c('specTilt_1','specTilt_2','specTilt_3','specTilt_4'))
With spectral tilt as an example. Then I get a data frame like this:
> head(mspec) target_tone variable value 1 M specTilt_1 2.983939 2 M specTilt_1 4.642098 3 L specTilt_1 NA 4 L specTilt_1 8.411694 5 H specTilt_1 5.555028 6 M specTilt_1 2.855002
And I can have the "variable" i.e. portions of the vowel as independent variable and "value" as dependent variable. But maybe split it up by tone category to make it easier to interpret/see differences. And stick with mixed effects because now I have 4 observations from the same vowel.
This is what I hinted at in my last mail. The best way to analyze this, if I understand your question here, would be to consider the duration of the vowel a continuous predictor and model your DVs over 'time', i.e., the percentage of the vowel (from 0 to 100%). Your praat script measured your DVs are time intervals over the course of the segments in question, right? Something like every 10% of the vowel maybe?
Lastly, thanks for the .rproj file and general housekeeping comments. I think you're right, that will ultimately make things a lot easier to work with. I'll switch over to a setup like that when I get a chance.
Okay, that's it for now. Hope to hear from you soon.
Joseph,
Sorry about the delayed response - our department's job search and my students' midterm ate all my time this past week.
I've been working through with lmer
. A couple questions.
I found a package MuMIn
that has a function r.squaredGLMM()
which tells me the R-squared of both the fixed effects alone and of the entire model. Is it okay to report R-squared using this?
Also, should I add vowel type in the random effects structure as well? I do seem to gain some explanatory power by adding it in most cases. Where I might really expect an effect is with the F1-F0 measure, since different vowels naturally have different F1 ranges.
But, when I try to fit the following:
lmer(avg_f1minusf0 ~ target_tone + (1|block) + (1|target_vowel), data = datana)
I get this:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00292337 (tol = 0.002, component 1)
If I take out the random effect of block it works, but why can't I include both?
As to the other issue:
This is what I hinted at in my last mail. The best way to analyze this, if I understand your question here, would be to consider the duration of the vowel a continuous predictor and model your DVs over 'time', i.e., the percentage of the vowel (from 0 to 100%). Your praat script measured your DVs are time intervals over the course of the segments in question, right? Something like every 10% of the vowel maybe?
The Praat script measured DVs at every 25% of the vowel, so I have four slices for each measurement. The question I want to get at is whether or not being at a different point in the vowel has any effect on the DVs. You're right in that it seems like I should be able to use the slices as a way to reference a value at a point in time. Can I create a variable corresponding to slice and use that in models? With melt
again, considering the low tone subportion of the data I can get a frame like this (for F0):
> head(dat.m)
block variable value
1 1 f0_1 NA
2 1 f0_1 107.2908
3 1 f0_1 128.6416
4 1 f0_1 108.0923
5 1 f0_1 107.2770
6 1 f0_1 110.4938
And then model like this, with "slice" (called variable
in the data frame) as a predictor of f0 value:
> testt <- lmer(value ~ variable + (1|block) , data = dat.m)
> summary(testt)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: value ~ variable + (1 | block)
Data: dat.m
REML criterion at convergence: 2194.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.7715 -0.6634 -0.0772 0.5802 3.7654
Random effects:
Groups Name Variance Std.Dev.
block (Intercept) 14.23 3.772
Residual 35.30 5.941
Number of obs: 342, groups: block, 5
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 108.4038 1.7962 4.7859 60.35 4.4e-08 ***
variablef0_2 -9.1182 0.8645 334.0068 -10.55 < 2e-16 ***
variablef0_3 -15.4879 0.8919 334.0225 -17.36 < 2e-16 ***
variablef0_4 -18.2333 0.9489 334.0479 -19.21 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vrb0_2 vrb0_3
variablf0_2 -0.245
variablf0_3 -0.237 0.493
variablf0_4 -0.223 0.463 0.449
So now the model tells us about F0 between slices of the vowel. Is this an okay approach to addressing this question? The result seems to match what I've reported about the L tone contour.
Thanks again!
Also, just letting you know I posted an update that uses lmer()
and the method I described above for getting a "time" variable (not sure if you get a notification when I push changes to the repo).
Joseph,
Sorry about the delayed response - our department's job search and my students' midterm ate all my time this past week.
I've been working through with
lmer
. A couple questions.I found a package
MuMIn
that has a functionr.squaredGLMM()
which tells me the R-squared of both the fixed effects alone and of the entire model. Is it okay to report R-squared using this?
Yes, this is fine. I use this package myself and find it to be quite useful.
Also, should I add vowel type in the random effects structure as well? I do seem to gain some explanatory power by adding it in most cases. Where I might really expect an effect is with the F1-F0 measure, since different vowels naturally have different F1 ranges.
But, when I try to fit the following:
lmer(avg_f1minusf0 ~ target_tone + (1|block) + (1|target_vowel), data = datana)
I get this:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.00292337 (tol = 0.002, component 1)
If I take out the random effect of block it works, but why can't I include both?
You often will get convergence issues when you try to use a costly random effects structure without sufficient data. That is likely what is happening here. You can try to uncorrelate the intercepts by running this:
lmer(avg_f1minusf0 ~ target_tone + (1|block) + (0|target_vowel), data = datana)
The difference being the 0
in (0|target_vowel)
. That may work, if not you will have to remove the random intercept.
As to the other issue:
This is what I hinted at in my last mail. The best way to analyze this, if I understand your question here, would be to consider the duration of the vowel a continuous predictor and model your DVs over 'time', i.e., the percentage of the vowel (from 0 to 100%). Your praat script measured your DVs are time intervals over the course of the segments in question, right? Something like every 10% of the vowel maybe?
The Praat script measured DVs at every 25% of the vowel, so I have four slices for each measurement. The question I want to get at is whether or not being at a different point in the vowel has any effect on the DVs. You're right in that it seems like I should be able to use the slices as a way to reference a value at a point in time. Can I create a variable corresponding to slice and use that in models? With
melt
again, considering the low tone subportion of the data I can get a frame like this (for F0):> head(dat.m) block variable value 1 1 f0_1 NA 2 1 f0_1 107.2908 3 1 f0_1 128.6416 4 1 f0_1 108.0923 5 1 f0_1 107.2770 6 1 f0_1 110.4938
And then model like this, with "slice" (called
variable
in the data frame) as a predictor of f0 value:> testt <- lmer(value ~ variable + (1|block) , data = dat.m) > summary(testt) Linear mixed model fit by REML. t-tests use Satterthwaite's method [ lmerModLmerTest] Formula: value ~ variable + (1 | block) Data: dat.m REML criterion at convergence: 2194.9 Scaled residuals: Min 1Q Median 3Q Max -2.7715 -0.6634 -0.0772 0.5802 3.7654 Random effects: Groups Name Variance Std.Dev. block (Intercept) 14.23 3.772 Residual 35.30 5.941 Number of obs: 342, groups: block, 5 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 108.4038 1.7962 4.7859 60.35 4.4e-08 *** variablef0_2 -9.1182 0.8645 334.0068 -10.55 < 2e-16 *** variablef0_3 -15.4879 0.8919 334.0225 -17.36 < 2e-16 *** variablef0_4 -18.2333 0.9489 334.0479 -19.21 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) vrb0_2 vrb0_3 variablf0_2 -0.245 variablf0_3 -0.237 0.493 variablf0_4 -0.223 0.463 0.449
So now the model tells us about F0 between slices of the vowel. Is this an okay approach to addressing this question? The result seems to match what I've reported about the L tone contour.
Thanks again!
Looks great. I believe this is exactly what you want. The first slice is your intercept and it compares 2, 3, and 4 to slice 1. This backs up what you show visually in your figure with the boxplots. Nice work.
Okay great, thanks. Just a few more minor questions. When doing nested model comparisons, I ran into the following situation. Using anova()
, I find that a model with spectral tilt as the dependent variable and vowel type as the only independent variable doesn't explain significantly more variance than the null model. There are no significant effects. Okay, so take out vowel type.
A model with tone as the only ind. var., however, does explain a lot of variance and beats the null model. This is what I would hope to see if being on a different tone level affects acoustic measures associated with creaky voice. So, okay, keep tone in the model. Now, if I go and add vowel type into this model, there are some significant effects of vowel type, and anova()
suggests that the tone + vowel
model is better than the model with just tone.
I guess my question is very broad and probably pretty basic: why does this kind of thing happen? Is it because all the measurements I took are related to each other?
And also, the R2 of the two models is ~.45 vs. ~.50 for the fixed effects and ~.54 vs. ~.55 for the whole model. I know there are other considerations besides R2 for judging how informative a model is, but those seem like pretty small increases to me. What is the model comparison considering that makes it so sure the fuller model is better? Degrees of freedom?
This past week our department had Timo Roettger here for a job talk. He seemed to know you. Anyway I asked him a few stats questions in our individual meeting. He suggested I add 'word' as a random effect, as the words I picked for the experiment are a random sample of a larger population of Yoruba words. Seems like a reasonable suggestion so I will do that I think. Of the candidates our department has seen so far, I think he's a favorite - he really knows his stats and is very concerned with proper methodology and reporting. Would really fill a gap we have in the department, I think.
Anyway, thanks again for the help.
Okay great, thanks. Just a few more minor questions. When doing nested model comparisons, I ran into the following situation. Using
anova()
, I find that a model with spectral tilt as the dependent variable and vowel type as the only independent variable doesn't explain significantly more variance than the null model. There are no significant effects. Okay, so take out vowel type.A model with tone as the only ind. var., however, does explain a lot of variance and beats the null model. This is what I would hope to see if being on a different tone level affects acoustic measures associated with creaky voice. So, okay, keep tone in the model. Now, if I go and add vowel type into this model, there are some significant effects of vowel type, and
anova()
suggests that thetone + vowel
model is better than the model with just tone.I guess my question is very broad and probably pretty basic: why does this kind of thing happen? Is it because all the measurements I took are related to each other?
It could be because they measurements are related to each other. You'll recall the multicollinearity can cause a lot of odd problems. When working with 'order of entry' type issues, remember that you want to give causal priority (what you put it first) to the variables you consider most important/theoretically motivated.
And also, the R2 of the two models is ~.45 vs. ~.50 for the fixed effects and ~.54 vs. ~.55 for the whole model. I know there are other considerations besides R2 for judging how informative a model is, but those seem like pretty small increases to me. What is the model comparison considering that makes it so sure the fuller model is better? Degrees of freedom?
Degrees of freedom are part of the equation. A more complex model will have more degrees of freedom (this is why it is important to always change 1 thing when comparing). Your model comparisons are likelihood ratio tests. They directly compare the goodness of fit of the two models (essentially how likely the data are under each model).
This past week our department had Timo Roettger here for a job talk. He seemed to know you. Anyway I asked him a few stats questions in our individual meeting. He suggested I add 'word' as a random effect, as the words I picked for the experiment are a random sample of a larger population of Yoruba words. Seems like a reasonable suggestion so I will do that I think. Of the candidates our department has seen so far, I think he's a favorite - he really knows his stats and is very concerned with proper methodology and reporting. Would really fill a gap we have in the department, I think.
Anyway, thanks again for the help.
Timo is great. He would make a fantastic addition to the Linguistics department. I agree that including 'word' to the random effects structure is ideal. I don't recall the details of your dataset, but you may run into a case where lmer tells you that your model is singular. If this happens it means that you have overfit the data. I'm not sure if this will happen in your case, but just a heads up in case it shows up. If it does, just keep the by-subj intercepts.
How is everything going? No need for a big update... just making sure everything is ok.
Hi Joseph! Thanks for checking in. Everything is going well. I'm lining up more recording sessions. This time I'll be using CVCV syllables. Akin thought I should check if there's a difference in creaky voice between the two positions. Also L tones in an LM or LH sequence word will not have the falling contour I talk about. Obviously I'll have have to report it as a different experiment so that'll change the structure of the paper a bit.
Also, with time in the semester winding down, the impression I get from Akin is that if I have something to say about creaky voice in CV and CVCV syllables, that would be enough for him for the project. Nobody has yet looked at creaky voice by position in Yoruba. It would be nice (and interesting) to get to a perception study, but I'm not sure I'll have the time. We'll see. Does that still sound good to you?
Hi Joseph @jvcasillas , how's it going? I just pushed the most recent updates to the repo. Here's where I'm at:
I ended up dropping the perception part entirely, and instead just took a look at CVCV syllables in addition to CV syllables. I think I found some interesting stuff. In the future maybe I can look at doing perception as well, but Akin feels this is enough for a QP.
The section on the CV experiment is largely the same, but I dropped the F1-F0 measure because the Praat script wasn't getting the calculation the way I thought it was. It doesn't seem to have mattered that much but it's something I can maybe add in the future. I also added some post-hoc Tukey test results.
The section on the CVCV experiment looks at similar models as the CV experiment but in both syllables. You'll see how that went when you read it.
In terms of the project itself, I am looking at wrapping it up and defending before the end of June (before then because that's when Robin leaves and can't officially be a committee member anymore). I have sent the same draft you will find in the repo to Akin and Robin. After I incorporate whatever comments you all have, would you be able to attend a defense?
Thanks in advance and for all your previous help. Hope to hear from you soon.
Sounds great, Nate. I look forward to reading it. I will be here in June and can attend your defense.
Hi Joseph, did you get a chance to check out the draft? I just got some comments from Robin that I'll put here so you don't end up spending a bunch of time writing remarks on the same things:
"1. Lots more figures! Experimental papers can always benefit from more figures. Particularly since you have the room---you’re not constrained by journal requirements for this paper, so go wild.
I'm going to start working with Akin to set up a defense date, so you may hear from him or me about that before long.
Okay, thanks again for all your help.
Hey Nate. I'll get the comments to you this weekend. I was on vacation and I am playing catch up now. Who was the chair of your committee? Viviane?
The chair is Akin. He mentioned in an email yesterday scheduling a defense for next week, so you may hear from him soon. Thanks!
Nate,
Sorry for the delay in getting back to you. You QP looks good. I think you have done good work. You can anticipate questions from regarding your analysis ("why did you do this?" and a few "what ifs"). There are no wrong answers. Just an exercise in defending the choices you made. If you have any questions for me before Thursday, just let me know. I will give you more specific comments on Thursday, but don't stress, nothing major.
Joseph,
Great, sounds good! I will see you there.
Leaving a comment here so I remember to come back to this.
Could you include
MM02_tgrids.csv
in the repo in case I have questions and want to run the code?