openspending / openspending-migrate

0 stars 1 forks source link

Ongoing Stats #3

Open danfowler opened 8 years ago

danfowler commented 8 years ago

Numbers

Private datasets: 586 (981 datasets have empty or incomplete "data" model) Public datasets: 1094

Sizes

Public Datasets

without archived sources: 9.86 GB with archived sources: 17.32 GB

Private Datasets

without archived sources: 5.1 GB with archived sources: 10.61 GB

Sources

Private Datasets

valid_sources: 456 (78% of total sources) invalid_sources: 128

Public Datasets

valid_sources: 1396 (92% of total sources) invalid_sources: 120

Usernames

private

1 owner: 557 (2.7 GB) 2 owners: 19 (768.1 MB) 3 owners: 7 (1.6 GB) 4 owners: 1 (0) 5 owners: 2 (382.2 kB) 6 owners: 1 (3.0 MB) 8 owners: 1 (0)

public

0 owners: 8 (134.5 MB) 1 owner: 987 (4.2 GB) 2 owners: 75 (4.4 GB) 3 owners: 13 (557.3 MB) 4 owners: 8 (83.8 MB) 5 owners: 3 (461.3 MB)

pwalsh commented 8 years ago

Hey @danfowler

  1. What % of the datasets have available sources
  2. Can you also count private datasets
  3. Once we've solved any issues like usernames, and package names, we'll just load it all straight to the new OS Datastore.
danfowler commented 8 years ago

@pwalsh updated the stats above. What do you mean by resolve package name issues?

rufuspollock commented 8 years ago

Note as per earlier discussions a few months ago we do not need to migrate the private datasets IMO. In general private datasets were just datasets people never completed and got around to publishing and we can just leave them (note: we can archive them somewhere safely like OK's standard backup).

rufuspollock commented 8 years ago

BTW could we get a CSV list of all the public datasets and their owners?

danfowler commented 8 years ago

@rgrp https://dl.dropboxusercontent.com/u/12909676/owners.csv

rufuspollock commented 8 years ago

@danfowler could i suggest gists in future - much nicer to read and update ;-) For others here's a datapipes preview: http://datapipes.okfnlabs.org/csv/html?url=https://dl.dropboxusercontent.com/u/12909676/owners.csv