List all sources - Githubissues

mcnuttandrew commented 5 years ago

Seems like this has fallen by the wayside, but, should it ever come back to development: it would be cool if there were some elementary statistics for each of the datasets. Like how many rows of data, the names of the columns, the types of those columns, etc. Basically the same collection of things that kaggle lists for lots of the datasets on there

santiagorr commented 5 years ago

It should be great to include the license of the source files as well.

hydrosquall commented 3 years ago

it would be cool if there were some elementary statistics for each of the datasets

I think there are at least 2 components that this issue could be split up into

Convert the SOURCES.md file into something machine readable, like a JSON file, or a folder of YAML files. We could adopt a process similar to what "awesome public datasets" ( https://github.com/awesomedata/awesome-public-datasets ) or "campusdata" did in the past: https://github.com/CampusData/campusdata.github.io/blob/master/_data/rankings.yml .
Add metadata about each sample file in the repo. Perhaps we might keep a script around that programmatically generates this info, and stores it. This way you can do things like query for a dataset with at least 1 datetime column, or a dataset with at least 3 quantitative columns and over 3000 rows.

In the meantime, there are at least 2 peer projects that can fulfill some of the data exploration usecases for the single file data requests

https://github.com/pandas-profiling/pandas-profiling
https://github.com/githubocto/flat-viewer (Take any of the file URLs, and add flat in front, like https://flatgithub.com/vega/vega-datasets/blob/next/data/birdstrikes.csv?filename=data%2Fairports.csv&sha=05fcb7c07b1d76206856e75129fc1e79dc61735c )

domoritz commented 2 years ago

world-110m.json looks like it could be from https://www.jsdelivr.com/package/npm/world-atlas?version=1.1.4&path=world (https://github.com/topojson/world-atlas).

vega / vega-datasets

List all sources #15