Open simonw opened 2 years ago
I work at The Wall Street Journal as a computational journalist and serve as our self-appointed Datasette evangelist. They say that to a hammer everything looks like a nail, but the reality is newsrooms find themselves in a sea of nails!
I've only got a couple public projects that I can share, but happy to offer you a look at some of the internal projects.
More often than not the internal projects stay internal because the reporting doesn't lead anywhere or I can't convince an editor to greenlight it. But imho that's the beauty of datasette: a (relatively) painless mechanism to see if there's any there there.
I'm a cryptic crossword enthusiast and have spent a lot of time scraping and parsing cryptic crossword clues from various blogs, forums and publications. The result is over half a million clues from cryptic crosswords over the past twelve years, including the clue, answer, puzzle date, puzzle name and a link to the original source. This is all hosted using Datasette, which has been a delight to use: https://cryptics.georgeho.org/
This dataset is a significant work of crossword archivism and scholarship, as acquiring historical crosswords and structuring their contents require focused effort and tedious cleaning that few are willing to do for such trivial data - for example, according to this 2004 selection guide, the Library of Congress explicitly does not collect crossword puzzles. Anecdotally, I know that many constructors/setters of cryptic crosswords use this dataset as a resource, and some even simply call it "the database" - this is probably one of the most impactful data projects I've worked on!
Tim Sherratt on Twitter: https://twitter.com/wragge/status/1591930345469153282
Where do I start? The #GLAMWorkbench now includes a number of examples where GLAM data is harvested, processed, and then made available for exploration via Datasette.
For example the GLAM Name Index Search brings together 10+ million entries from 240 indexes and provides an aggregated search using the Datasette search-all plugin:
https://glam-workbench.net/name-search/
Most recently I converted PDFs of the Tasmanian Postal Directories to a big Datasette instance: https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html the process is documented and reusable.
Hi @simonw and thanks for the great tools you're publishing, your dedication is inspiring!
I work for the French Ministry of Culture on a surveying tool for objects protected for their historical value. It is part of a program building modern public services called beta.gouv.fr.
In that context I'm using data published by the Ministry that I have ingested into datasette and published on a free Fly instance : https://collectif-objets-datasette.fly.dev . I have also ingested another data set with infos about french cities on this instance so that I can perform joined queries.
The surveying tool synchronizes its data regularly from this datasette instance, and I also use it to perform queries when asked generic questions about the distribution of objects. (The data is not very accessible as it's undocumented and for internal usage mostly)
Nothing spectacular yet but I think this falls under "cool/cute application of datasette": improving fakedata performance for fun. tl;dr I used datasette to visualize benchmarking data.
I use Datasette to analyze blocklists by using csv-to-sqlite to pull their contents into a database and Datasette to look around through them. I also use its REST API to query said database as part of filtering out garbage from domains found in those blocklists.
This probably counts as a case study: https://github.com/eyeseast/spatial-data-cooking-show. Even has video.
Seriously, though, this workflow has become integral to my work with reporters and editors across USA TODAY Network. Very often, I get sent a folder of data in mixed formats, with a vague ask of how we should communicate some part of it to users. Datasette and its constellation of tools makes it easy to get a quick look at that data, run exploratory queries, map it and ask questions to figure out what's important to show. And then I export a version of the data that's exactly what I need for display.
Happy Birthday Datasette!
I am a librarian at the Université du Québec à Montréal (UQAM) and I've been using Datasette to publish excerpts of our library data. There are several use cases I'm working with as a proof of concept :
See our prototype under construction here : https://datasette-bib.uqam.ca/ (some bits and pieces have been translated into French)
Datasette is amazing, there is so much potential here for libraries. Thanks to Simon and all the contributors for this outstanding effort. Also sqlite-utils deserves special mention as incredibly handy and useful.
Datasette usage comments for its 5th anniversary celebration:
I use Datasette and related tools for a Cosmology Researcher Talks database app project, which is described in the github Readme
The app hosted on the Google Cloud Run service also uses other Datasette-related tools developed by Simon - datasette-render-markdown, csvs-to-sqlite, datasette-template-sql, and datasette-block-robots. This is one of two apps used for querying the talks database, each has it pros/cons as described in the github Readme.
At present, over 170 different sites that host cosmology talks are scraped to collect new talks for import into the sqlite database. The shot-scraper and sqlite-utils tools are a major help for this.
I also use the Mastodon API to get my favorites, toots, and boosts into a local database so I can do searches on the data. This was done on Twitter and was then extended to the Mastodon data. Again, sqlite-utils is an important tool for this.
Happy Birthday Datasette!
Thanks Simon!!
I use datasette on everything most notably my flickr metadata SQLite DB to make art.
Datasette lite on my 2019 flickr metadata is super helpful too: https://lite.datasette.io/?csv=https%3A%2F%2Fraw.githubusercontent.com%2Frtanglao%2Frt-flickr-sqlite-csv%2Fmain%2F2019-roland-flickr-metadata.csv
Even better datasette lite on all firefox support questions from 2021: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubusercontent.com%2Frtanglao%2Frt-kits-api3%2Fmain%2FYEARLY_CSV_FILES%2F2021-firefox-sumo-questions.db
Thanks again Simon! So great! What a gift to the world!!!!!!
Happy birthday to datasette and thank you Simon for your continued effort on this project!
I use datasette (python) as a fast layer on top of search for github projects using https://github.com/dogsheep/github-to-sqlite , and use the JSON API it provides to serve sample data to make Vega-Lite graphing workshop examples that don't require authentication/API keys. It's awesome to have a full SQL API support working without needing to develop any custom API middleware for both filtering and grouping.
I've also enjoyed using it as a teaching tool for working with public dataset in civic data workshops and as a platform for making visualization plugins . I
I'm especially excited about datasette-lite, as it will let people participate in future editions of this workshop without having to install anything to make use of their own tables :)
i wrote up a blog post of how i'm using it! https://bunkum.us/2022/11/20/mgdo-stack.html
A bit late to this, but I have made an app to publish air quality data in Bristol, UK. air quality data in Bristol, UK. Next step to see if I can make a streamlit app based on this to produce some nice charts.
Datasette is 5 years old today. To celebrate, I'm asking the community for birthday presents:
https://simonwillison.net/2022/Nov/13/datasette-birthday/