Closed heinrichreimer closed 1 year ago
Yes, downloading things from google drive is a thing people do.
Embeddings,jl uses GoogleDrive.jl similarly. I think it is broadly similar to the code that is inside Transformers.jl https://github.com/JuliaText/Embeddings.jl/blob/306c04bead62b32873dedbc2609c74c4ca34306b/src/Paragram.jl#L31
I don't see any reason to have it in this package. More useful to have it in another suitable package (like GoogleDrive.jl, or some new package if you want to start from scratch) that can do this and likely more (e.g. writing). When those can work with DataDeps.jl
That could look like AWSS3.jl which provides the S3Path
type,
which works with DataDeps without needed to specifiy fetch_method
because it overloads Base.basename
and Base.download
.
These two things are all that is required to work with DataDeps without a fetch method:
https://github.com/oxinabox/DataDeps.jl/blob/85f28c1a3e577c892a2fde6a40bab3f1ab6de451/src/fetch_helpers.jl#L51-L60
More broadly: It would be really cool if someone overloaded the FilePathsBase API for Google Drive.
Other reason i wouldn't want it here is I don't want to take on dependencies nor do i want to take on maintance burden.
So possibly a GoogleDriveFile("0B9w48e1rj-MOLVdZRzFfTlNsem8")
struct could be added to GoogleDrive.jl with overriding Base.basename
and Base.download
?
If that would then work out-of-the-box with DataDeps.jl, I would agree that should be the preferred way.
yeah that would be great
Thank you for discussing the issue when downloading large files from Google Drive. I also think it would be really cool to add this code to GoogleDrive.jl as I can't even download a 43 MB file without virus scanner interference. I have looked at the suggestions above, but I didn't immediately grasp how to do this coding myself.
Closing due to inactivity.
Often datasets are distributed on Google Drive. That's an issue because Google requires confirming downloading for large files (i.e., on which they don't scan malware). Transformers.jl already has a custom
fetch_method
implementation for that case. So I wonder if it might be worth including that helper method in DataDeps.jl, possibly integrating it without having to usefetch_method
at all.