Open domoritz opened 2 weeks ago
Looks like the count in flights_200k
may also be off.
from vega_datasets import data
datasets = ['flights_2k', 'flights_5k', 'flights_10k', 'flights_20k', 'flights_200k', 'flights_3m']
for dataset_name in datasets:
dataset = getattr(data, dataset_name)()
row_count = len(dataset)
print(f"{dataset_name}: {row_count} rows")
Results:
flights_2k: 2000 rows
flights_5k: 5000 rows
flights_10k: 10000 rows
flights_20k: 20000 rows
flights_200k: 231083 rows
flights_3m: 231083 rows
We can regenerate 3m rows using this script, create a csv from the 3m parquet file here or something else?
https://github.com/vega/vega-datasets/blob/main/data/flights-3m.csv seems to only have 200k rows.
Added in https://github.com/vega/vega-datasets/commit/1e70098e5c15069314a1be82a37c82c0fbb5f66f by @arvind