Closed zhiyongww closed 4 years ago
You can adjust the thresholds as you like.
I hope that clarifies it.
Can I ask what the parsing process was to get the 8,050 examples? Using the transcripts from the training set, I am counting the number of times 'Participant' appears as the speaker while counting consecutive 'Participant' turns as just one example (since they're essentially one response to a question), but this only yields me around 6200 examples. Any advice would be much appreciated!
"From the initial set of 553 features, we excluded all features without a statistically significant univariate correlation with outcomes on the training set (|ρ| < 1e-01, p > 1e-02) nor a significant L1 regularized logistic regression model coefficient (|β| < 1e-04), thus resulting in a subset of 279 features and 8,050 examples (responses)"
How to exclude some features to get a subset of 279 features ?