quanteda / readtext

an R package for reading text files
https://readtext.quanteda.io
120 stars 28 forks source link

Add download() and improve handling of temporary files #120

Closed koheiw closed 6 years ago

koheiw commented 6 years ago

Adds download() with cache functionality. Unlike quanteda.corpora::download(), it only returns path the cache file, so that it can be used for all sorts of files.

codecov-io commented 6 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@5dc1df7). Click here to learn what that means. The diff coverage is 94.31%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #120   +/-   ##
=========================================
  Coverage          ?   85.22%           
=========================================
  Files             ?        8           
  Lines             ?      528           
  Branches          ?        0           
=========================================
  Hits              ?      450           
  Misses            ?       78           
  Partials          ?        0
Impacted Files Coverage Δ
R/readtext.R 85.14% <100%> (ø)
R/download.R 100% <100%> (ø)
R/utils.R 94.82% <94.18%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5dc1df7...9d2fcd1. Read the comment docs.

koheiw commented 6 years ago

Let's merge this and update CRAN.

kbenoit commented 6 years ago

My main question is: Do we need this as a separate function? The nice thing about readtext is that is has a single function that works in a huge range of cases. Direct download from a remote is supposed to be one of those cases. With the news cache argument, repeated downloads do not need to be re-fetched from the remote. But do we then need a new exposed function (download()) for this? How about making this just work directly through readtext()?

Also the increasing complexity of the variety of ways that readtext() can be used points to the need for a vignette (https://github.com/quanteda/quanteda/issues/996 but really should be an issue for readtext and be added to a pkgdown site as per #61, #65).

Does this fix solve any other open issues, e.g. #119 or #105?

koheiw commented 6 years ago

I initially made it only accessible from readtext() but I noticed that I want to download corpus objects. Since readtext() always returns data.frame, this is inconsistent. This is why I separated it as download().

I commented on #119 and #105 too.

ogorodriguez commented 6 years ago

Is the available already in readtext and/or quanteda? Do I have update both packages?