quanteda / readtext

an R package for reading text files
https://readtext.quanteda.io
120 stars 28 forks source link

Problem with JSON in readtext #107

Closed jslapin closed 7 years ago

jslapin commented 7 years ago

Quick question… readtext does not seem to be working for JSON at the moment. I’m trying the example from your vingette:

t < - readtext(paste0(DATA_DIR, "/json/inaugural_sample.json”))

And I get the error message:

Error in get_json_lines(path, text_field, ...) : Cannot use numeric text_field with json file

I have tried this with other data in JSON format and always get the same or similar message — see below for another example. I also notice that you have moved the text_field command from readtext() to corpus(). I get the error even when using the text_field option in corpus() and even when the text_field is clearly text.

test <- readtext("~/Dropbox (Personal)/Teaching/QTA/Sage Materials/condensed_2015.json”) corpus(test, text_field= "text") Error in corpus.data.frame(test, text_field = "text") : text_field must refer to a character mode column

It seems to work with .txt and .csv files just fine. Curious if you have any insights.

kbenoit commented 7 years ago

Hmmm.... I get a warning but nothing like what you indicate.

> (rt6 <- readtext(paste0(DATA_DIR, "json/inaugural_sample.json"), text_field = "texts"))
readtext object consisting of 3 documents and 3 docvars.
# data.frame [3 x 5]
                   doc_id                text  Year  President FirstName
                    <chr>               <chr> <int>      <chr>     <chr>
1 inaugural_sample.json.1 "\"Fellow-Cit\"..."  1789 Washington    George
2 inaugural_sample.json.2 "\"Fellow cit\"..."  1793 Washington    George
3 inaugural_sample.json.3 "\"When it wa\"..."  1797      Adams      John
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  Doesn't look like Tweets json file, trying general JSON
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readtext_0.51

loaded via a namespace (and not attached):
 [1] httr_1.2.1        compiler_3.4.0    rjson_0.2.15      R6_2.2.2          tools_3.4.0      
 [6] RCurl_1.95-4.8    tibble_1.3.3      Rcpp_0.12.11      stringi_1.1.5     streamR_0.2.1    
[11] data.table_1.10.4 jsonlite_1.5      bitops_1.0-6      rlang_0.1.1.9000 
jslapin commented 7 years ago

Ok. Looks like I might have been a little slow in updating R. I'm running 3.3.3. Perhaps that's issue. I will update and try again.

jslapin commented 7 years ago

Works now.

kbenoit commented 7 years ago

Good! I doubt it was the R version though, but could have been an older version of jsonlite. Package requirements state jsonlite >= 0.9.10 but this is not enforced when you install.