simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.21k stars 656 forks source link

Call for birthday presents: if you're using Datasette, let us know how you're using it here #1886

Open simonw opened 1 year ago

simonw commented 1 year ago

Datasette is 5 years old today. To celebrate, I'm asking the community for birthday presents:

https://simonwillison.net/2022/Nov/13/datasette-birthday/

To celebrate this open source project’s birthday, I’ve decided to try something new: I’m going to ask for birthday presents.

An aspect of Datastte’s marketing that I’ve so far neglected is social proof. I think it’s time to change that: I know people are using the software to do cool things, but this often happens behind closed doors.

For Datastte’s birthday, I’m looking for endorsements and case studies and just general demonstrations that show how people are using it do so cool stuff.

So: if you’ve used Datasette to solve a problem, and you’re willing to publicize it, please give us the gift of your endorsement!

[...]

Add a comment to this issue thread describing what you’re doing. Just a few sentences is fine—though a screenshot or even a link to a live instance would be even better

noslouch commented 1 year ago

I work at The Wall Street Journal as a computational journalist and serve as our self-appointed Datasette evangelist. They say that to a hammer everything looks like a nail, but the reality is newsrooms find themselves in a sea of nails!

I've only got a couple public projects that I can share, but happy to offer you a look at some of the internal projects.

More often than not the internal projects stay internal because the reporting doesn't lead anywhere or I can't convince an editor to greenlight it. But imho that's the beauty of datasette: a (relatively) painless mechanism to see if there's any there there.

eigenfoo commented 1 year ago

I'm a cryptic crossword enthusiast and have spent a lot of time scraping and parsing cryptic crossword clues from various blogs, forums and publications. The result is over half a million clues from cryptic crosswords over the past twelve years, including the clue, answer, puzzle date, puzzle name and a link to the original source. This is all hosted using Datasette, which has been a delight to use: https://cryptics.georgeho.org/

This dataset is a significant work of crossword archivism and scholarship, as acquiring historical crosswords and structuring their contents require focused effort and tedious cleaning that few are willing to do for such trivial data - for example, according to this 2004 selection guide, the Library of Congress explicitly does not collect crossword puzzles. Anecdotally, I know that many constructors/setters of cryptic crosswords use this dataset as a resource, and some even simply call it "the database" - this is probably one of the most impactful data projects I've worked on!

simonw commented 1 year ago

Tim Sherratt on Twitter: https://twitter.com/wragge/status/1591930345469153282

Where do I start? The #GLAMWorkbench now includes a number of examples where GLAM data is harvested, processed, and then made available for exploration via Datasette.

https://glam-workbench.net/

For example the GLAM Name Index Search brings together 10+ million entries from 240 indexes and provides an aggregated search using the Datasette search-all plugin:

https://glam-workbench.net/name-search/

Most recently I converted PDFs of the Tasmanian Postal Directories to a big Datasette instance: https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html the process is documented and reusable.

adipasquale commented 1 year ago

Hi @simonw and thanks for the great tools you're publishing, your dedication is inspiring!

I work for the French Ministry of Culture on a surveying tool for objects protected for their historical value. It is part of a program building modern public services called beta.gouv.fr.

In that context I'm using data published by the Ministry that I have ingested into datasette and published on a free Fly instance : https://collectif-objets-datasette.fly.dev . I have also ingested another data set with infos about french cities on this instance so that I can perform joined queries.

The surveying tool synchronizes its data regularly from this datasette instance, and I also use it to perform queries when asked generic questions about the distribution of objects. (The data is not very accessible as it's undocumented and for internal usage mostly)

lucapette commented 1 year ago

Nothing spectacular yet but I think this falls under "cool/cute application of datasette": improving fakedata performance for fun. tl;dr I used datasette to visualize benchmarking data.

virtadpt commented 1 year ago

I use Datasette to analyze blocklists by using csv-to-sqlite to pull their contents into a database and Datasette to look around through them. I also use its REST API to query said database as part of filtering out garbage from domains found in those blocklists.

eyeseast commented 1 year ago

This probably counts as a case study: https://github.com/eyeseast/spatial-data-cooking-show. Even has video.

Seriously, though, this workflow has become integral to my work with reporters and editors across USA TODAY Network. Very often, I get sent a folder of data in mixed formats, with a vague ask of how we should communicate some part of it to users. Datasette and its constellation of tools makes it easy to get a quick look at that data, run exploratory queries, map it and ask questions to figure out what's important to show. And then I export a version of the data that's exactly what I need for display.

sachaj commented 1 year ago

Happy Birthday Datasette!

I am a librarian at the Université du Québec à Montréal (UQAM) and I've been using Datasette to publish excerpts of our library data. There are several use cases I'm working with as a proof of concept :

  1. New titles list : based on reports of recent acquisitions by subject, discipline, etc.
  2. List of all UQAM theses and dissertations : based on an extract of bibliographic records
  3. List of all publications by UQAM Authors : based on an extract of bibliographic records

See our prototype under construction here : https://datasette-bib.uqam.ca/ (some bits and pieces have been translated into French)

Datasette is amazing, there is so much potential here for libraries. Thanks to Simon and all the contributors for this outstanding effort. Also sqlite-utils deserves special mention as incredibly handy and useful.

jrdmb commented 1 year ago

Datasette usage comments for its 5th anniversary celebration:

I use Datasette and related tools for a Cosmology Researcher Talks database app project, which is described in the github Readme

The app hosted on the Google Cloud Run service also uses other Datasette-related tools developed by Simon - datasette-render-markdown, csvs-to-sqlite, datasette-template-sql, and datasette-block-robots. This is one of two apps used for querying the talks database, each has it pros/cons as described in the github Readme.

At present, over 170 different sites that host cosmology talks are scraped to collect new talks for import into the sqlite database. The shot-scraper and sqlite-utils tools are a major help for this.

I also use the Mastodon API to get my favorites, toots, and boosts into a local database so I can do searches on the data. This was done on Twitter and was then extended to the Mastodon data. Again, sqlite-utils is an important tool for this.

rtanglao commented 1 year ago

Happy Birthday Datasette!

Thanks Simon!!

I use datasette on everything most notably my flickr metadata SQLite DB to make art.

Datasette lite on my 2019 flickr metadata is super helpful too: https://lite.datasette.io/?csv=https%3A%2F%2Fraw.githubusercontent.com%2Frtanglao%2Frt-flickr-sqlite-csv%2Fmain%2F2019-roland-flickr-metadata.csv

Even better datasette lite on all firefox support questions from 2021: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubusercontent.com%2Frtanglao%2Frt-kits-api3%2Fmain%2FYEARLY_CSV_FILES%2F2021-firefox-sumo-questions.db

Thanks again Simon! So great! What a gift to the world!!!!!!

hydrosquall commented 1 year ago

Happy birthday to datasette and thank you Simon for your continued effort on this project!

I use datasette (python) as a fast layer on top of search for github projects using https://github.com/dogsheep/github-to-sqlite , and use the JSON API it provides to serve sample data to make Vega-Lite graphing workshop examples that don't require authentication/API keys. It's awesome to have a full SQL API support working without needing to develop any custom API middleware for both filtering and grouping.

I've also enjoyed using it as a teaching tool for working with public dataset in civic data workshops and as a platform for making visualization plugins . I

I'm especially excited about datasette-lite, as it will let people participate in future editions of this workshop without having to install anything to make use of their own tables :)

fgregg commented 1 year ago

i wrote up a blog post of how i'm using it! https://bunkum.us/2022/11/20/mgdo-stack.html

stevecrawshaw commented 1 year ago

A bit late to this, but I have made an app to publish air quality data in Bristol, UK. air quality data in Bristol, UK. Next step to see if I can make a streamlit app based on this to produce some nice charts.