ropensci / jstor

Import journal data from DfR (JSTOR)
https://docs.ropensci.org/jstor
47 stars 9 forks source link

Problem with vignette: Error in gzfile(file, "rb") : cannot open the connection #42

Closed BillyHall5 closed 6 years ago

BillyHall5 commented 6 years ago

FIrst, thank you for this jstor package. It seems perfectly fitted for working with the DfR files. I am having an issue though and thought I'd ask you about it. I'm very much a beginner but in following you example I get the following error when I try to access the bigrams_files:

Error in gzfile(file, "rb") : cannot open the connection In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file 'bigram_paths.rds', probable reason 'No such file or directory'

I'm not sure what's happening here. I'm working in rStudio and it appears that the value is there:

chr [1:1059] "./receipt-id-631571-part-001/ngram2/journal-article-10.2307_23271615-ngram2.txt" ...

Any help would be greatly appreciated and thank you again for putting together a very useful tool.

Billy

tklebel commented 6 years ago

I'm happy, you find the package useful!

It seems to me, you are trying to recreate the vignette "Analysing n-grams with jstor". From your error messages it appears to me, that you are using the source version on Github. This part

bigram_files <- readr::read_rds("bigram_paths.rds")

is only executed locally by me, when I create the vignette, because it is faster than

bigram_files <- list.files(path = c("receipt-id-624621-part-001/ngram2/",
                                    "receipt-id-624621-part-002/ngram2/"),
                           full.names = T)

when I render the vignette a few times while working on it.

The "correct" vignette, combined with output, can be found here: https://ropensci.github.io/jstor/articles/analysing-n-grams.html#importing-bigrams

If you want to save the paths to disk in order to speed up the process for re-running, you might want to do it like this:

# list all files
bigram_files <- list.files(path = c("path_to_directory_with_bigrams/",
                                    "another_path_to_directory_with_bigrams"),
                           full.names = T)

# write a single object with the list of paths to disk
readr::write_rds(bigram_files, "path_to_save_the_list_of_paths.rds")

The next time you run your script, you can simply use

bigram_files <- readr::read_rds("path_to_save_the_list_of_paths.rds")

to load the paths.

Depending on the number of files and the speed of your disk, this might save you some time.