quadrama / DramaAnalysis

An R package for analysis of dramatic texts
GNU General Public License v3.0
15 stars 2 forks source link

error loading texts #133

Closed BenjaminKrautter closed 5 years ago

BenjaminKrautter commented 5 years ago

Mit ids.all <-loadAllInstalledIds()undloadSegmentedText(ids.all)` bekomme ich folgende Fehlermeldung:

Error in data.table::foverlaps(t, sat, type = "any", by.x = c("corpus", : The last two columns in by.x should correspond to the 'start' and 'end' intervals in data.table 'x' and must be integer/numeric type. In addition: Warning message: In[.data.table(sat, is.na(Number.Scene),:=(Number.Scene = 0), : Coerced double RHS to character to match the type of the target column (column 8 named 'Number.Scene'). If the target column's type character is correct, it's best for efficiency to avoid the coercion and create the RHS as type character. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.character() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[,Number.Scene:=as.double(Number.Scene)]. If the RHS of := has nrow(DT) elements then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with sapply(DT,typeof).

mit text.all.l <- lapply(ids.all, loadSegmentedText) die gleiche:

Error in data.table::foverlaps(t, sat, type = "any", by.x = c("corpus", : The last two columns in by.x should correspond to the 'start' and 'end' intervals in data.table 'x' and must be integer/numeric type.

Beides funktioniert, wenn ich bspw. über loadSet("tragoediel") usw. eine entsprechend geringere Zahl an Dramen laden möchte.

BenjaminKrautter commented 5 years ago

Update: Wenn ich über loadMeta() alle Dramen ohne Publikationsdatum rauswerfe, funktioniert es ebenfalls (sind dann noch ~400 Dramen).

t-lini commented 5 years ago

Der Fehler taucht auch auf, wenn ein einzelnes Drama mit loadSegmentedText() geladen werden soll, dessen UtterancesWithTokens.csv bis auf die Spaltennamen leer ist, wie zum Beispiel hier. Man könnte loadSegmentedText() bzw. loadText() so erweitern, dass in einem solchen Fall eine Warnung ausgegeben und die entsprechenden IDs überspungen werden. Was meinst du, @nilsreiter ?

nilsreiter commented 5 years ago

Ja, das wäre sicher sinnvoll.

t-lini commented 5 years ago

For now, this only skips those csv-files that are empty for 3.x. This means that for dramas with broken UtterancesWithTokens.csv-files, the other files (Segmentation, Meta, etc) are still loaded into the QDDrama object. Do we want to skip the whole drama in these cases?

nilsreiter commented 5 years ago

Yes, I think skipping the entire drama would be better.