Open koheiw opened 6 years ago
Web server for this purpose should come with CDN. Candidates are
GitHub has a storage service https://git-lfs.github.com/ but not sure how it works.
quanteda.org's Google Drive might suffice.
Good point - feel free to try to make that work.
It seems to work if we modify the link a bit
quanteda.corpora::download(url = 'https://drive.google.com/uc?export=download&id=1VepIW420aAwIPxg4_Kj4Fi6jB-yllbGS')
Zenodo seems to be a good open repository for corpora. Download from Dropbox is faster but it is free and gives corpora DOIs.
> system.time(
+ download.file(url = "https://zenodo.org/record/1010076/files/GlycosideHydrolase_BLASTP.tar.gz?download=1",
+ destfile = tempfile())
+ )
trying URL 'https://zenodo.org/record/1010076/files/GlycosideHydrolase_BLASTP.tar.gz?download=1'
Content type 'application/octet-stream' length 38068097 bytes (36.3 MB)
==================================================
downloaded 36.3 MB
user system elapsed
0.749 0.514 100.665
>
> system.time(
+ download.file(url = "https://www.dropbox.com/s/631wdkr21cwh0ez/GlycosideHydrolase_BLASTP.tar.gz?dl=1",
+ destfile = tempfile())
+ )
trying URL 'https://www.dropbox.com/s/631wdkr21cwh0ez/GlycosideHydrolase_BLASTP.tar.gz?dl=1'
Content type 'application/binary' length 38068097 bytes (36.3 MB)
==================================================
downloaded 36.3 MB
user system elapsed
0.924 0.574 41.288
>
Nice, and as we discussed, the DOI feature is great too. However since .rda is already zipped, and since Zenodo is serving these as .zip files, this of course works too (and could from Zenodo, without the download.file):
load(url("https://kenbenoit.net/files/testcorpus.rda"))
testcorpus
## Corpus consisting of 58 documents and 3 docvars.
The data storage is currently my Dropbox folder. It might be better to have a web server (or dedicated Dropbox account).