simonw / til

Today I Learned
https://til.simonwillison.net
Apache License 2.0
1.02k stars 81 forks source link

Social image broken for latest entry #62

Closed abdusco closed 1 year ago

abdusco commented 1 year ago

Hey, just noticed that this post has a broken image in my RSS reader

Shot_2022-12-13_14 06 44@2x

with an alt text of sqlite_multiple_indexes.md. This caught my attention and I viewed the source and realized that social images meta tags are linked to the source file.

https://github.com/simonw/til/blob/2e7920e9a55b93cf9164639d9aad23c98f96bed5/templates/pages/%7Btopic%7D/%7Bslug%7D.html#L20-L26

simonw commented 1 year ago

Thanks for the bug report!

Despite the .md extension those URLs do actually return valid images - the .md extension is the key for that row in the database. Here's what's supposed to happen:

https://til.simonwillison.net/-/media/screenshot/github-actions_cache-setup-py.md

Or in an image tag:

But.. it looks like for that entry the image URL is returning a 500 error:

https://til.simonwillison.net/-/media/screenshot/sqlite_multiple-indexes.md

I looked here: https://til.simonwillison.net/tils/til?path__exact=sqlite_multiple-indexes.md&_sort_desc=updated_utc - and it looks like it's got a 0 length binary string for the shot column.

image

There are actually 7 rows that have 0 byes for their image right now: https://til.simonwillison.net/tils/til?_where=length(shot)%20==%200

simonw commented 1 year ago

The 500 error is a bug in datasette-media where if content is 0 it attempts to return a non-existent file instead:

https://github.com/simonw/datasette-media/blob/d853fa9a71bcbc966f93316d4533a6832d38e3e8/datasette_media/__init__.py#L129-L150

simonw commented 1 year ago

So the real bug here is why did those screenshots get generated as 0 byte images?

simonw commented 1 year ago

I've modified the generate_screenshots.py script to also regenerate any shots that are blank for whatever reason.

simonw commented 1 year ago

Before: https://til.simonwillison.net/tils?sql=select%0D%0A++%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+as+url%2C%0D%0A++%27https%3A%2F%2Ftil.simonwillison.net%2F-%2Fmedia%2Fscreenshot%2F%27+%7C%7C+path+as+screenshot_url%2C%0D%0A++length%28shot%29+as+shot_length%0D%0A++from+til+where+length%28shot%29+%3D%3D+0+order+by+updated_utc+desc+limit+101

image
select
  'https://til.simonwillison.net/' || topic || '/' || slug as url,
  'https://til.simonwillison.net/-/media/screenshot/' || path as screenshot_url,
  length(shot) as shot_length
  from til where length(shot) == 0 order by updated_utc desc limit 101

This query will return the "after" set once the fix has gone out:

select
  'https://til.simonwillison.net/' || topic || '/' || slug as url,
  'https://til.simonwillison.net/-/media/screenshot/' || path as screenshot_url,
  length(shot) as shot_length
from
  til
where
  path in (
    'github_github-pages.md',
    'googlecloud_gcloud-error-workaround.md',
    'github_github-code-search-api-uses.md',
    'gpt3_reformatting-text-with-copilot.md',
    'pytest_show-files-opened-by-tests.md',
    'spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md',
    'sqlite_multiple-indexes.md'
  )

https://til.simonwillison.net/tils?sql=select%0D%0A++%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+as+url%2C%0D%0A++%27https%3A%2F%2Ftil.simonwillison.net%2F-%2Fmedia%2Fscreenshot%2F%27+%7C%7C+path+as+screenshot_url%2C%0D%0A++length%28shot%29+as+shot_length%0D%0Afrom%0D%0A++til%0D%0Awhere%0D%0A++path+in+%28%0D%0A++++%22github_github-pages.md%22%2C%0D%0A++++%22googlecloud_gcloud-error-workaround.md%22%2C%0D%0A++++%22github_github-code-search-api-uses.md%22%2C%0D%0A++++%22gpt3_reformatting-text-with-copilot.md%22%2C%0D%0A++++%22pytest_show-files-opened-by-tests.md%22%2C%0D%0A++++%22spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md%22%2C%0D%0A++++%22sqlite_multiple-indexes.md%22%0D%0A++%29

simonw commented 1 year ago

That should have fixed it, but it didn't - here's the bit of the workflow where it failed:

https://github.com/simonw/til/actions/runs/3688947164/jobs/6244340939

Got 0 byte PNG for github_github-pages.md shot hash e54cea9896f7a466d6e5703ce107b2c5
Skipped mastodon_custom-domain-mastodon.md with shot hash ce0d0a68f4f0d7c51a91218ff2164456
Skipped mastodon_export-timeline-to-sqlite.md with shot hash e32ea7e0b44d57dd308e7fe3b2c8756a
Skipped gpt3_open-api.md with shot hash aec4a0f370d664069b942c6865274af8
Skipped json_json-pointer.md with shot hash 60052a94919658a34dc4ec7c02a644cc
Skipped gpt3_writing-test-with-copilot.md with shot hash 1cac30332578aaad53d5cd0a182a4e9b
Skipped html_datalist.md with shot hash 02acf7f812bf857fc830eb7f6089a7f4
Skipped git_git-archive.md with shot hash 98b0f5b70ef13976621959e7f7f50c4b
Skipped mastodon_verifying-github-on-mastodon.md with shot hash afe6e349ca47854f146c8ef1e1f13d66
Skipped observable-plot_wider-tooltip-areas.md with shot hash 0871f9336adcebf0ec24da5f9373fed9
Skipped datasette_cli-tool-that-is-also-a-plugin.md with shot hash cc42d4afa9ad1cb25ebeedcba62b70bd
Skipped html_lazy-loading-images.md with shot hash 052959fdc6df4e0f75c906f7115bc847
Skipped github-actions_cache-setup-py.md with shot hash b49a840f36dcf80a29a6f71227e0a753
Skipped docker_pipenv-and-docker.md with shot hash 8ef776fcdce11807608c7892b27a5e3c
Got 0 byte PNG for googlecloud_gcloud-error-workaround.md shot hash 747e75e8be74e58e42482751143e4f90
Got 0 byte PNG for github_github-code-search-api-uses.md shot hash 269f456f68266291edb4a7e354fbd961
Got 0 byte PNG for gpt3_reformatting-text-with-copilot.md shot hash 387afeea2ed9b02c5a9fb8a10745fd3d
Got 0 byte PNG for pytest_show-files-opened-by-tests.md shot hash 80495a72ee192edf259b462489e0c933
Got 0 byte PNG for spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md shot hash 02b4ec6523a8611d50e60121f2f8160c
Got 0 byte PNG for sqlite_multiple-indexes.md shot hash 42b8a982ad6ca28674ce61212dcce57d

It's the same 7 images again.

The weird thing is that those screenshots generate just fine when I run that script on my laptop.

simonw commented 1 year ago

I'm going to switch to https://shot-scraper.datasette.io/ for screenshots and see if that helps

simonw commented 1 year ago

Looks like shot-scraper worked in the GitHub Actions run:

Skipped python_pdb-interact.md with shot hash 3de66a55d705fdadd6fab5b7d1dc1ff0
Got 65373 byte PNG for github_github-pages.md shot hash e54cea9896f7a466d6e5703ce107b2c5
Skipped mastodon_custom-domain-mastodon.md with shot hash ce0d0a68f4f0d7c51a91218ff2164456
Skipped mastodon_export-timeline-to-sqlite.md with shot hash e32ea7e0b44d57dd308e7fe3b2c8756a
Skipped gpt3_open-api.md with shot hash aec4a0f370d664069b942c6865274af8
Skipped json_json-pointer.md with shot hash 60052a94919658a34dc4ec7c02a644cc
Skipped gpt3_writing-test-with-copilot.md with shot hash 1cac30332578aaad53d5cd0a182a4e9b
Skipped html_datalist.md with shot hash 02acf7f812bf857fc830eb7f6089a7f4
Skipped git_git-archive.md with shot hash 98b0f5b70ef13976621959e7f7f50c4b
Skipped mastodon_verifying-github-on-mastodon.md with shot hash afe6e349ca47854f146c8ef1e1f13d66
Skipped observable-plot_wider-tooltip-areas.md with shot hash 0871f9336adcebf0ec24da5f9373fed9
Skipped datasette_cli-tool-that-is-also-a-plugin.md with shot hash cc42d4afa9ad1cb25ebeedcba62b70bd
Skipped html_lazy-loading-images.md with shot hash 052959fdc6df4e0f75c906f7115bc847
Skipped github-actions_cache-setup-py.md with shot hash b49a840f36dcf80a29a6f71227e0a753
Skipped docker_pipenv-and-docker.md with shot hash 8ef776fcdce11807608c7892b27a5e3c
Got 59593 byte PNG for googlecloud_gcloud-error-workaround.md shot hash 747e75e8be74e58e42482751143e4f90
Got 81047 byte PNG for github_github-code-search-api-uses.md shot hash 269f456f68266291edb4a7e354fbd961
Got 71664 byte PNG for gpt3_reformatting-text-with-copilot.md shot hash 387afeea2ed9b02c5a9fb8a10745fd3d
Got 51927 byte PNG for pytest_show-files-opened-by-tests.md shot hash 80495a72ee192edf259b462489e0c933
Got 67890 byte PNG for spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md shot hash 02b4ec6523a8611d50e60121f2f8160c
Got 103632 byte PNG for sqlite_multiple-indexes.md shot hash 42b8a982ad6ca28674ce61212dcce57d
simonw commented 1 year ago

That fixed it!

Ran this query to generate the following:

select
  group_concat('https://til.simonwillison.net/' || topic || '/' || slug || '

![](https://til.simonwillison.net/-/media/screenshot/' || path || ')', '

') as screenshot_url
from
  til
where
  path in (
    "github_github-pages.md",
    "googlecloud_gcloud-error-workaround.md",
    "github_github-code-search-api-uses.md",
    "gpt3_reformatting-text-with-copilot.md",
    "pytest_show-files-opened-by-tests.md",
    "spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md",
    "sqlite_multiple-indexes.md"
  )

https://til.simonwillison.net/tils?sql=select%0D%0A++group_concat%28%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%0D%0A%0D%0A%21%5B%5D%28https%3A%2F%2Ftil.simonwillison.net%2F-%2Fmedia%2Fscreenshot%2F%27+%7C%7C+path+%7C%7C+%27%29%27%2C+%27%0D%0A%0D%0A%27%29+as+screenshot_url%0D%0Afrom%0D%0A++til%0D%0Awhere%0D%0A++path+in+%28%0D%0A++++%22github_github-pages.md%22%2C%0D%0A++++%22googlecloud_gcloud-error-workaround.md%22%2C%0D%0A++++%22github_github-code-search-api-uses.md%22%2C%0D%0A++++%22gpt3_reformatting-text-with-copilot.md%22%2C%0D%0A++++%22pytest_show-files-opened-by-tests.md%22%2C%0D%0A++++%22spatialite_viewing-geopackage-data-with-spatialite-and-datasette.md%22%2C%0D%0A++++%22sqlite_multiple-indexes.md%22%0D%0A++%29

https://til.simonwillison.net/github/github-code-search-api-uses

https://til.simonwillison.net/github/github-pages

https://til.simonwillison.net/googlecloud/gcloud-error-workaround

https://til.simonwillison.net/gpt3/reformatting-text-with-copilot

https://til.simonwillison.net/pytest/show-files-opened-by-tests

https://til.simonwillison.net/spatialite/viewing-geopackage-data-with-spatialite-and-datasette

https://til.simonwillison.net/sqlite/multiple-indexes