Closed hbaniecki closed 2 years ago
Hi again @hbaniecki!
Wow, this is an amazing idea :clap::clap::clap:
What do you think if adding a method called load_from_url()
to the Dataset
class, which would do the same thing as the current Dataset.load_from_files()
but instead of loading the dataset from disk, it will do it from an URL, as you suggested.
Perhaps load_from_url()
should take two arguments, load_from_url(url, folder=None)
, first the url
from which to download the zipped dataset and secondly, an optional argument called something like folder
that let the user to specify a particular folder to use from inside the zipped dataset. The example would end up being something like:
from pyss3 import SS3
x_train, y_train = Dataset.load_from_url("https://url/to/movie_review.zip", "train")
x_test, y_test = Dataset.load_from_url("https://url/to/movie_review.zip", "test")
clf = SS3()
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
What do you think? (Again, thank you for this suggestion, I think it is an awesome idea :muscle::sunglasses::+1:)
Thanks! If the data won't be saved on disk and only loaded into x_train, y_train
etc. then load_from_url(url, folder=None)
makes perfect sense.
Hi @hbaniecki! sorry for the delay, I just had to wait for the weekend to get down on this. I've added the suggested methods and also updated the README.md
. Just check it out and let me know if everthing is OK :muscle: :nerd_face: :+1:
Below I'm pasting the commit message that marked this issue as closed:
Now datasets can be directly loaded via a given url, not only from disk. To achieve this, two methods have been added to
Dataset
class:
Dataset.load_from_url(...)
Dataset.load_from_url_multilabel(...)
These methods download and extract the zip file (given by the url) into the system's temporary folder and then call
Dataset.load_from_files()
to load it (orDataset.load_from_files_multilabel()
, respectively).Note: If the same url is used consecutively, the already downloaded files will be used as a cache (to avoid downloading and extracting them again).
No worries. Thanks! Works great.
openjournals/joss-reviews#3934
This package has good documentation. Going through the examples I came up with a feature request, which would greatly benefit introducing newcomers and prototyping code.
I like the first example in README to be straightforward and copy-paste ready, which is not the case here (looking at missing code
...
).How about implementing some
import_dataset(url)
/download(url)
functionality inutils
orDataset
that would, for example, download the dataset.zip
file and unpack it (sample code) so that one can load the data into exemplary code:Implementation details and naming may vary, but it would be nice to easily run code from README.