quadrama / DramaAnalysis

An R package for analysis of dramatic texts
GNU General Public License v3.0
15 stars 2 forks source link

presence for some dramas can't be calculated #156

Closed nilsreiter closed 4 years ago

nilsreiter commented 4 years ago

Works well:

library(DramaAnalysis)
drama <- loadDrama("rksp.0", defaultCollection = "qd")
presence(drama)

doesn't work:

library(DramaAnalysis)
drama <- loadDrama("ksd1.0", defaultCollection = "qd")
presence(drama)

Error message:

Error in `.rowNamesDF<-`(x, value = value) : 
  missing values in 'row.names' are not allowed
In addition: Warning messages:
1: In data.table::set(conf.passive, which(is.na(conf.passive[[j]])),  :
  Coerced 'logical' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
2: In data.table::set(conf.passive, which(is.na(conf.passive[[j]])),  :
  RHS contains 0 which is outside the levels range ([1,43]) of column 1, NAs generated

Discovered by @pagelj

nilsreiter commented 4 years ago

Root cause are utterances without speaker assignments. E.g., line 1976 in ksd1.0.UtterancesWithTokens.

nilsreiter commented 4 years ago

This can be fixed after loading with this line:

drama$text <- drama$text[!is.na(bad$text$Speaker.figure_id),]

This:

library(DramaAnalysis)
drama <- loadDrama("ksd1.0", defaultCollection = "qd")
drama$text <- drama$text[!is.na(drama$text$Speaker.figure_id),]
presence(drama)

works well (but some utterances are skipped)

nilsreiter commented 4 years ago

This is a duplicate of #157