Closed trinker closed 10 years ago
Can I ask that in the future you use qdap's issues page as directed in the hep manual: https://github.com/trinker/qdap/issues?state=open? This allows others to see a problem and (a) help solve it (b) learn from the solutions of others.
Solving this problem is difficult if it can't be reproduced. I don't have your data set so its difficult to wrap my head around what you're after. You could make a dummy data set with missing values. Something along the lines of:
library(qdap)
DATA[c(3, 8), 4] <- NA
DATA
## person sex adult state code
## 1 sam m 0 Computer is fun. Not too fun. K1
## 2 greg m 0 No it's not, it's dumb. K2
## 3 teacher m 1 <NA> K3
## 4 sam m 0 You liar, it stinks! K4
## 5 greg m 0 I am telling the truth! K5
## 6 sally f 0 How can we be certain? K6
## 7 greg m 0 There is no way. K7
## 8 sam m 0 <NA> K8
## 9 sally f 0 What are you talking about? K9
## 10 researcher f 1 Shall we move on? Good then. K10
## 11 greg m 0 I'm hungry. Let's eat. You already? K11
Then I can run your code. Right now I'm not understanding why you'd use the non-exported function (termco.h
) instead of termco
itself. I also don't know what version of qdap
you're using. Using: packageDescription("qdap")["Version"]
will provide that information.
Using the example I gave above I reproduced your error:
DATA$x <- factor(seq_along(DATA$state))
with(DATA, qdap:::termco.h(state, "the", x))
And have made fixes to the development version of qdap with the line:
X[is.na(X[, "Y"]), "Y"] <- 0
You'll have to install devtools
and use the development version to get the update (see: https://github.com/trinker/qdap#installation). However I'd suggest using the termco
function as it's more elegant. Here is an example:
with(DATA, termco(state, seq_along(DATA$state), "the"))
## Just the counts:
with(DATA, termco(state, seq_along(DATA$state), "the"))$raw[, 3]
Sent via email by Matt Williamson:
I am using your
qdap
package to search for and count key words in text fields associated with various entries from the Federal Register. Unfortunately, data standards for the FR are not well-enforced so some fields have NA values which seem to cause issues when runningtermco.h
(which I used based on some StackOverflow examples). Here is the code I am using (which works fine on fields where every record has an entry):and here is the error:
I am assuming that there is some internal function call that is having a problem with the
NA
values in theACT
field. Is there a way to work around this so that those rows that do not contain the term receive a 0 and those rows that haveNA
are identified withNA
? Any help you can offer would be much appreciated.