Open krystian8207 opened 4 years ago
Let's create example files:
csv1 <- data.frame( doc_id = c("doc1", "doc2"), text = c("Lorem ipsum", "dolor sit amet"), docvar1 = c("A", "B"), docvar2 = c("C", "D"), stringsAsFactors = FALSE ) csv2 <- csv1[1, ] write.csv(csv1, file = "/tmp/csv1.csv", row.names = FALSE) write.csv(csv2, file = "/tmp/csv2.csv", row.names = FALSE)
For csv1.csv doc_id and text are sourced correctly:
> readtext::readtext("/tmp/csv1.csv", docid_field = "doc_id", text_field = "text") readtext object consisting of 2 documents and 2 docvars. # Description: df[,4] [2 × 4] doc_id text docvar1 docvar2 <chr> <chr> <chr> <chr> 1 doc1 "\"Lorem ipsu\"..." A C 2 doc2 "\"dolor sit \"..." B D
For csv2.csv doc_id is based on filename:
> readtext::readtext("/tmp/csv2.csv", docid_field = "doc_id", text_field = "text") readtext object consisting of 1 document and 2 docvars. # Description: df[,4] [1 × 4] doc_id text docvar1 docvar2 <chr> <chr> <chr> <chr> 1 csv2.csv "\"Lorem ipsu\"..." A C
Let's create example files:
For csv1.csv doc_id and text are sourced correctly:
For csv2.csv doc_id is based on filename: