Closed kbenoit closed 5 years ago
Is there a reason not to add this option to get_json()
? I think it makes more sense to have it.
Agreed, it does make sense. The other question: Should we automatically recognize the quanteda::corpus.data.frame()
defaults? i.e. docid_field = "doc_id", text_field = "text"
?
This only makes sense for multi-document inputs - so is not an active default for single-document inputs that do not contain key-value pairs or column headers - but we could indicate that clearly in the documentation. (You can't have a one-function-does-all approach and have every argument make sense for every input.)
Should we automatically recognize the quanteda::corpus.data.frame() defaults? i.e. docid_field = "doc_id", text_field = "text"?
I was thinking about it. I'd say it would be good to send a message about it such that "doc_id
field exists in the file. If you intend to use it as a document identifier, use docid_field
option." Auto-recognition might be confusing.
For json
, I will implement it later today.
Sounds good, pls make both changes. See the function I used for setting docid_field
in utils.R
. Once the JSON has become a data.frame I think we can use the same function, at the end of get_json()
.
Adds a docid_field to
readtext()
, which adds this functionality for .csv, .tsv, .xls(x), and .ods.There is no default value, as requested in #155, because it only makes sense for spreadsheet-like inputs and because
text_id
also has no default.Note: The branch is misnamed!