Open kbenoit opened 6 years ago
Comment by kbenoit Tuesday Jul 11, 2017 at 07:46 GMT
Here you have a problem with syntax issues, but also with your groups.
t88 <- readtext::readtext(file = "~/Downloads/hoc.corpus88.zip",
text_field = "text",
docvarsfrom = "filenames")
corp88 <- corpus(t88)
dfm88 <- dfm(corp88,
remove = c(stopwords("english")),
remove_punct = TRUE,
stem = TRUE)
The problem with your syntax above was mis-matched parentheses.
Note also that in your wordshoal
call you need to combine the two docvars into one, as per below, for the groups, and you need to use the correct variable name for the author field.
head(docvars(dfm88))
# V1 X date session speechnumber speaker party chair terms parliament country
# hoc.corpus88.csv.1 1 1 1988-11-22 1988-89 1 HOUSESPEAKER other TRUE 1068 UK-HoC <NA>
# hoc.corpus88.csv.2 2 2 1988-11-22 1988-89 2 HOUSESPEAKER other TRUE 61 UK-HoC <NA>
# hoc.corpus88.csv.3 3 3 1988-11-22 1988-89 3 Giles Shaw Con FALSE 2514 UK-HoC <NA>
# hoc.corpus88.csv.4 4 4 1988-11-22 1988-89 4 John Maples Con FALSE 1490 UK-HoC England
# hoc.corpus88.csv.5 5 5 1988-11-22 1988-89 5 Neil Kinnock Lab FALSE 2775 UK-HoC Wales
# hoc.corpus88.csv.6 6 6 1988-11-22 1988-89 6 David Harris Con FALSE 1 UK-HoC <NA>
# docvar1
# hoc.corpus88.csv.1 hoc.corpus88
# hoc.corpus88.csv.2 hoc.corpus88
# hoc.corpus88.csv.3 hoc.corpus88
# hoc.corpus88.csv.4 hoc.corpus88
# hoc.corpus88.csv.5 hoc.corpus88
# hoc.corpus88.csv.6 hoc.corpus88
shoaltm <- textmodel_wordshoal(dfm88,
groups = interaction(docvars(dfm88, c("party", "country"))),
authors = docvars(dfm88, "speaker"))
# Error in textmodel_wordshoal.dfm(dfm88, groups = interaction(docvars(dfm88, :
# only a single case for the following groups:
# DUP.England
# SDLP.England
# SNP.England
# UPUP.England
# UUP.England
# Con.Northern Ireland
# Lab.Northern Ireland
# LibDem.Northern Ireland
# other.Northern Ireland
# PlaidCymru.Northern Ireland
# SDP.Northern Ireland
# SNP.Northern Ireland
# DUP.Scotland
# PlaidCymru.Scotland
# SDLP.Scotland
# SDP.Scotland
# UPUP.Scotland
# UUP.Scotland
# DUP.Wales
# PlaidCymru.Wales
# SDLP.Wales
# SDP.Wales
# SNP.Wales
# UPUP.Wales
# UUP.Wales
But unfortunately here you have too few authors, so need to pare them. You can do this before creating the dfm using corpus_subset()
, or you can trim them using index slicing from the dfm()
.
The person to make textmodel_wordshoal()
more robust to these issues is @lauderdale and I am hoping he'll get to them this summer at some point.
Comment by methodds Wednesday Aug 09, 2017 at 17:24 GMT
did you change anything for wordshoal during the last few quanteda versions? I'm receiving a lot of warnings for a corpus which did not happen before:
......Warning: The algorithm did not converge..............Warning: The algorithm did not converge.
20 .................Warning: The algorithm did not converge..
.40 ...................60 ...................80 .....
Warning: The algorithm did not converge...............
100 .....Warning: The algorithm did not converge..Warning:
The algorithm did not converge..............120 ............Warning: The algorithm did not converge........
...
If you didn't change anything: Is is possible that changes for dfm()
affect wordshoal in an unintentional way?
Comment by kbenoit Wednesday Aug 09, 2017 at 20:23 GMT
We added a warning when the algorithm reached the iteration limit in the Wordfish routine that it calls, but otherwise the behaviour should be the same. textmodel_wordshoal()
remains an experimental function, but I am hoping that @lauderdale will devote some time soon to making it more robust.
Issue by kwainfan Monday Jul 10, 2017 at 22:47 GMT Originally opened as https://github.com/kbenoit/quanteda/issues/845
I am getting an error when I try to run a wordshoal model.
data: hoc.corpus88.zip