iamaziz / PyDataset

Instant access to many datasets in Python.
MIT License
934 stars 87 forks source link

Process for adding datasets? #4

Closed lethargilistic closed 7 years ago

lethargilistic commented 8 years ago

In the README, there's interest in expanding the number of datasets. I'm wondering what kind of criteria that new data would have to meet. Just of the top of my head:

  1. Would it need to be useful prima facie, or would niche data also be acceptable? The kind of thing I'm considering (not seriously for inclusion, just in general) is that I'm working on scraping info about episodes of Detective Conan, such as what characters appeared in them. Would that be too niche?
  2. Would it have to pass some vote for inclusion? If so, who gets a vote?
  3. All the current data is csv. Would other kinds of data formats be able to be included later? Like HDF5?
iamaziz commented 8 years ago

Until creating a sort of online repository, the datasets are stored locally. So adding a new dataset will presumably be local as well, in which each person can maintain a customized library of datasets in addition to the default set. Thus, there should not be any restrictions on which dataset to include as long as it maintains a certain (like tabular) structure.