simonw / til

Today I Learned
https://til.simonwillison.net
Apache License 2.0
1.12k stars 93 forks source link

Show related TILs at the bottom of each TIL #50

Closed simonw closed 2 years ago

simonw commented 2 years ago

Might be interesting. Can try and figure out related TILs using FTS searches.

simonw commented 2 years ago

I tried taking the text from https://til.simonwillison.net/docker/emulate-s390x-with-qemu, stripping all non-alphanumeric characters and joining the results together using OR - I got this:

emulating OR a OR bigendian OR s390x OR with OR qemu OR i OR got OR a OR bug OR report OR concerning OR my OR sqlitefts4 OR project OR running OR on OR ppc64 OR and OR s390x OR architectures OR the OR s390x OR is OR an OR ibm OR mainframe OR architecture OR which OR i OR found OR glamorous OR the OR bug OR related OR to OR those OR machines OR being OR bigendian OR vs OR my OR software OR being OR tested OR on OR littleendian OR machines OR my OR first OR attempt OR at OR fixing OR it OR see OR this OR til OR turned OR out OR not OR to OR be OR correct OR i OR really OR needed OR a OR way OR to OR test OR agaist OR an OR emulated OR s390x OR machine OR with OR bigendian OR byte OR order OR i OR figured OR out OR how OR to OR do OR that OR using OR docker OR for OR mac OR and OR qemu OR multiarchqemuuserstaticregister OR this OR is OR the OR first OR command OR to OR run OR it OR does OR something OR magical OR to OR your OR docker OR installation OR docker OR run OR rm OR privileged OR multiarchqemuuserstaticregister OR reset OR the OR qemuuserstatic OR readme OR says OR multiarchqemuuserstatic OR and OR multiarchqemuuserstaticregister OR images OR execute OR the OR register OR script OR that OR registers OR below OR kind OR of OR procsysfsbinfmtmiscqemuarch OR files OR for OR all OR supported OR processors OR except OR the OR current OR one OR in OR it OR when OR running OR the OR container OR it OR continues OR the OR reset OR option OR is OR implemented OR at OR the OR register OR script OR that OR executes OR find OR procsysfsbinfmtmisc OR type OR f OR name OR qemu OR exec OR sh OR c OR echo OR 1 OR to OR remove OR binfmtmisc OR entry OR files OR before OR register OR the OR entry OR i OR dont OR understand OR what OR this OR means OR but OR running OR this OR command OR was OR essential OR for OR the OR next OR command OR to OR work OR multiarchubuntucores390xfocal OR having OR run OR that OR command OR the OR following OR command OR drops OR you OR into OR a OR shell OR in OR an OR emulated OR s390x OR machine OR running OR ubuntu OR focal OR docker OR run OR it OR multiarchubuntucores390xfocal OR binbash OR using OR focal OR gives OR you OR python OR 38 OR i OR previously OR tried OR s390xbionic OR but OR that OR gave OR me OR python OR 36 OR you OR dont OR actually OR get OR python OR until OR you OR install OR it OR like OR so OR aptget OR y OR update OR aptget OR y OR install OR python3 OR this OR will OR take OR a OR while OR i OR think OR its OR slower OR because OR the OR hardware OR is OR being OR emulated OR now OR you OR can OR check OR the OR python OR version OR and OR confirm OR that OR the OR byte OR order OR is OR bigendian OR like OR this OR rootea63e288ce49 OR python3 OR version OR python OR 3810 OR rootea63e288ce49 OR python3 OR c OR import OR sys OR printsysbyteorder OR big OR doing OR this OR in OR github OR actions OR i OR figured OR out OR the OR following OR recipe OR for OR running OR this OR in OR github OR actions OR in OR this OR example OR im OR cloning OR my OR sqlitefts4 OR repo OR and OR running OR the OR tests OR in OR it OR as OR well OR name OR qemu OR to OR run OR s390xfocal OR on OR push OR workflowdispatch OR jobs OR one OR runson OR ubuntulatest OR steps OR name OR setup OR multiarchqemuuserstatic OR run OR docker OR run OR rm OR privileged OR multiarchqemuuserstaticregister OR reset OR name OR ubuntucores390xfocal OR uses OR dockermultiarchubuntucores390xfocal OR with OR args OR bash OR c OR uname OR a OR lscpu OR grep OR endian OR aptget OR y OR update OR aptget OR y OR install OR python3 OR git OR python38venv OR python3 OR version OR python3 OR c OR import OR sys OR printsysbyteorder OR git OR clone OR httpsgithubcomsimonwsqlitefts4 OR cd OR sqlitefts4 OR python3 OR m OR venv OR venv OR source OR venvbinactivate OR pip OR install OR e OR test OR pytest

Here's a SQL query that uses that:

select title, rank from til_fts where til_fts match
'emulating OR a OR bigendian OR s390x OR with OR qemu OR i OR got OR a OR bug OR report OR concerning OR my OR sqlitefts4 OR project OR running OR on OR ppc64 OR and OR s390x OR architectures OR the OR s390x OR is OR an OR ibm OR mainframe OR architecture OR which OR i OR found OR glamorous OR the OR bug OR related OR to OR those OR machines OR being OR bigendian OR vs OR my OR software OR being OR tested OR on OR littleendian OR machines OR my OR first OR attempt OR at OR fixing OR it OR see OR this OR til OR turned OR out OR not OR to OR be OR correct OR i OR really OR needed OR a OR way OR to OR test OR agaist OR an OR emulated OR s390x OR machine OR with OR bigendian OR byte OR order OR i OR figured OR out OR how OR to OR do OR that OR using OR docker OR for OR mac OR and OR qemu OR multiarchqemuuserstaticregister OR this OR is OR the OR first OR command OR to OR run OR it OR does OR something OR magical OR to OR your OR docker OR installation OR docker OR run OR rm OR privileged OR multiarchqemuuserstaticregister OR reset OR the OR qemuuserstatic OR readme OR says OR multiarchqemuuserstatic OR and OR multiarchqemuuserstaticregister OR images OR execute OR the OR register OR script OR that OR registers OR below OR kind OR of OR procsysfsbinfmtmiscqemuarch OR files OR for OR all OR supported OR processors OR except OR the OR current OR one OR in OR it OR when OR running OR the OR container OR it OR continues OR the OR reset OR option OR is OR implemented OR at OR the OR register OR script OR that OR executes OR find OR procsysfsbinfmtmisc OR type OR f OR name OR qemu OR exec OR sh OR c OR echo OR 1 OR to OR remove OR binfmtmisc OR entry OR files OR before OR register OR the OR entry OR i OR dont OR understand OR what OR this OR means OR but OR running OR this OR command OR was OR essential OR for OR the OR next OR command OR to OR work OR multiarchubuntucores390xfocal OR having OR run OR that OR command OR the OR following OR command OR drops OR you OR into OR a OR shell OR in OR an OR emulated OR s390x OR machine OR running OR ubuntu OR focal OR docker OR run OR it OR multiarchubuntucores390xfocal OR binbash OR using OR focal OR gives OR you OR python OR 38 OR i OR previously OR tried OR s390xbionic OR but OR that OR gave OR me OR python OR 36 OR you OR dont OR actually OR get OR python OR until OR you OR install OR it OR like OR so OR aptget OR y OR update OR aptget OR y OR install OR python3 OR this OR will OR take OR a OR while OR i OR think OR its OR slower OR because OR the OR hardware OR is OR being OR emulated OR now OR you OR can OR check OR the OR python OR version OR and OR confirm OR that OR the OR byte OR order OR is OR bigendian OR like OR this OR rootea63e288ce49 OR python3 OR version OR python OR 3810 OR rootea63e288ce49 OR python3 OR c OR import OR sys OR printsysbyteorder OR big OR doing OR this OR in OR github OR actions OR i OR figured OR out OR the OR following OR recipe OR for OR running OR this OR in OR github OR actions OR in OR this OR example OR im OR cloning OR my OR sqlitefts4 OR repo OR and OR running OR the OR tests OR in OR it OR as OR well OR name OR qemu OR to OR run OR s390xfocal OR on OR push OR workflowdispatch OR jobs OR one OR runson OR ubuntulatest OR steps OR name OR setup OR multiarchqemuuserstatic OR run OR docker OR run OR rm OR privileged OR multiarchqemuuserstaticregister OR reset OR name OR ubuntucores390xfocal OR uses OR dockermultiarchubuntucores390xfocal OR with OR args OR bash OR c OR uname OR a OR lscpu OR grep OR endian OR aptget OR y OR update OR aptget OR y OR install OR python3 OR git OR python38venv OR python3 OR version OR python3 OR c OR import OR sys OR printsysbyteorder OR git OR clone OR httpsgithubcomsimonwsqlitefts4 OR cd OR sqlitefts4 OR python3 OR m OR venv OR venv OR source OR venvbinactivate OR pip OR install OR e OR test OR pytest'
order by rank

And the result: https://til.simonwillison.net/tils?sql=select+title%2C+rank+from+til_fts+where+til_fts+match+%27emulating+OR+a+OR+bigendian+OR+s390x+OR+with+OR+qemu+OR+i+OR+got+OR+a+OR+bug+OR+report+OR+concerning+OR+my+OR+sqlitefts4+OR+project+OR+running+OR+on+OR+ppc64+OR+and+OR+s390x+OR+architectures+OR+the+OR+s390x+OR+is+OR+an+OR+ibm+OR+mainframe+OR+architecture+OR+which+OR+i+OR+found+OR+glamorous+OR+the+OR+bug+OR+related+OR+to+OR+those+OR+machines+OR+being+OR+bigendian+OR+vs+OR+my+OR+software+OR+being+OR+tested+OR+on+OR+littleendian+OR+machines+OR+my+OR+first+OR+attempt+OR+at+OR+fixing+OR+it+OR+see+OR+this+OR+til+OR+turned+OR+out+OR+not+OR+to+OR+be+OR+correct+OR+i+OR+really+OR+needed+OR+a+OR+way+OR+to+OR+test+OR+agaist+OR+an+OR+emulated+OR+s390x+OR+machine+OR+with+OR+bigendian+OR+byte+OR+order+OR+i+OR+figured+OR+out+OR+how+OR+to+OR+do+OR+that+OR+using+OR+docker+OR+for+OR+mac+OR+and+OR+qemu+OR+multiarchqemuuserstaticregister+OR+this+OR+is+OR+the+OR+first+OR+command+OR+to+OR+run+OR+it+OR+does+OR+something+OR+magical+OR+to+OR+your+OR+docker+OR+installation+OR+docker+OR+run+OR+rm+OR+privileged+OR+multiarchqemuuserstaticregister+OR+reset+OR+the+OR+qemuuserstatic+OR+readme+OR+says+OR+multiarchqemuuserstatic+OR+and+OR+multiarchqemuuserstaticregister+OR+images+OR+execute+OR+the+OR+register+OR+script+OR+that+OR+registers+OR+below+OR+kind+OR+of+OR+procsysfsbinfmtmiscqemuarch+OR+files+OR+for+OR+all+OR+supported+OR+processors+OR+except+OR+the+OR+current+OR+one+OR+in+OR+it+OR+when+OR+running+OR+the+OR+container+OR+it+OR+continues+OR+the+OR+reset+OR+option+OR+is+OR+implemented+OR+at+OR+the+OR+register+OR+script+OR+that+OR+executes+OR+find+OR+procsysfsbinfmtmisc+OR+type+OR+f+OR+name+OR+qemu+OR+exec+OR+sh+OR+c+OR+echo+OR+1+OR+to+OR+remove+OR+binfmtmisc+OR+entry+OR+files+OR+before+OR+register+OR+the+OR+entry+OR+i+OR+dont+OR+understand+OR+what+OR+this+OR+means+OR+but+OR+running+OR+this+OR+command+OR+was+OR+essential+OR+for+OR+the+OR+next+OR+command+OR+to+OR+work+OR+multiarchubuntucores390xfocal+OR+having+OR+run+OR+that+OR+command+OR+the+OR+following+OR+command+OR+drops+OR+you+OR+into+OR+a+OR+shell+OR+in+OR+an+OR+emulated+OR+s390x+OR+machine+OR+running+OR+ubuntu+OR+focal+OR+docker+OR+run+OR+it+OR+multiarchubuntucores390xfocal+OR+binbash+OR+using+OR+focal+OR+gives+OR+you+OR+python+OR+38+OR+i+OR+previously+OR+tried+OR+s390xbionic+OR+but+OR+that+OR+gave+OR+me+OR+python+OR+36+OR+you+OR+dont+OR+actually+OR+get+OR+python+OR+until+OR+you+OR+install+OR+it+OR+like+OR+so+OR+aptget+OR+y+OR+update+OR+aptget+OR+y+OR+install+OR+python3+OR+this+OR+will+OR+take+OR+a+OR+while+OR+i+OR+think+OR+its+OR+slower+OR+because+OR+the+OR+hardware+OR+is+OR+being+OR+emulated+OR+now+OR+you+OR+can+OR+check+OR+the+OR+python+OR+version+OR+and+OR+confirm+OR+that+OR+the+OR+byte+OR+order+OR+is+OR+bigendian+OR+like+OR+this+OR+rootea63e288ce49+OR+python3+OR+version+OR+python+OR+3810+OR+rootea63e288ce49+OR+python3+OR+c+OR+import+OR+sys+OR+printsysbyteorder+OR+big+OR+doing+OR+this+OR+in+OR+github+OR+actions+OR+i+OR+figured+OR+out+OR+the+OR+following+OR+recipe+OR+for+OR+running+OR+this+OR+in+OR+github+OR+actions+OR+in+OR+this+OR+example+OR+im+OR+cloning+OR+my+OR+sqlitefts4+OR+repo+OR+and+OR+running+OR+the+OR+tests+OR+in+OR+it+OR+as+OR+well+OR+name+OR+qemu+OR+to+OR+run+OR+s390xfocal+OR+on+OR+push+OR+workflowdispatch+OR+jobs+OR+one+OR+runson+OR+ubuntulatest+OR+steps+OR+name+OR+setup+OR+multiarchqemuuserstatic+OR+run+OR+docker+OR+run+OR+rm+OR+privileged+OR+multiarchqemuuserstaticregister+OR+reset+OR+name+OR+ubuntucores390xfocal+OR+uses+OR+dockermultiarchubuntucores390xfocal+OR+with+OR+args+OR+bash+OR+c+OR+uname+OR+a+OR+lscpu+OR+grep+OR+endian+OR+aptget+OR+y+OR+update+OR+aptget+OR+y+OR+install+OR+python3+OR+git+OR+python38venv+OR+python3+OR+version+OR+python3+OR+c+OR+import+OR+sys+OR+printsysbyteorder+OR+git+OR+clone+OR+httpsgithubcomsimonwsqlitefts4+OR+cd+OR+sqlitefts4+OR+python3+OR+m+OR+venv+OR+venv+OR+source+OR+venvbinactivate+OR+pip+OR+install+OR+e+OR+test+OR+pytest%27+order+by+rank

title rank
Emulating a big-endian s390x with QEMU -659.2115928763853
Using LD_PRELOAD to run any version of SQLite with Python -181.95755317680786
Running Docker on an M1 Mac -174.68368794751368
Installing packages from Debian unstable in a Docker image based on stable -150.23715986888004
Testing things in Fedora using Docker -150.11995408175295

Took 422ms. Results actually look pretty good!

simonw commented 2 years ago

Since the results take 422ms I'd like to implement some kind of caching. Options:

  1. Run this as part of the build script
  2. In-memory cache - will get reset when the Cloud Run instance restarts
  3. In-SQLite-cache - will get reset when Cloud Run is re-deployed

I'm leaning towards option 2 just because it's the simplest to implement.

simonw commented 2 years ago

Notebook: https://observablehq.com/@simonw/extract-issue-numbers-from-pasted-text

This one dedupes the words using new Set() in JavaScript. After deduping it looks like things run in 100ms which is fast enough that I don't think I'll bother with any caching:

https://til.simonwillison.net/tils?sql=select+title%2C+rowid%2C+rank+from+til_fts+where+til_fts+match+%27running+OR+ocr+OR+against+OR+a+OR+pdf+OR+file+OR+with+OR+aws+OR+textract+OR+is+OR+the+OR+api+OR+its+OR+very+OR+good+OR+ive+OR+fed+OR+it+OR+handwritten+OR+notes+OR+from+OR+1890s+OR+and+OR+read+OR+them+OR+better+OR+than+OR+i+OR+could+OR+can+OR+be+OR+run+OR+directly+OR+jpeg+OR+or+OR+png+OR+images+OR+up+OR+to+OR+5mb+OR+but+OR+if+OR+you+OR+want+OR+have+OR+first+OR+upload+OR+an+OR+s3+OR+bucket+OR+update+OR+30th+OR+june+OR+2022+OR+used+OR+what+OR+learned+OR+in+OR+this+OR+til+OR+build+OR+s3ocr+OR+command+OR+line+OR+utility+OR+for+OR+pdfs+OR+try+OR+out+OR+dont+OR+need+OR+use+OR+at+OR+all+OR+document+OR+they+OR+offer+OR+demo+OR+tool+OR+console+OR+httpsuswest1consoleawsamazoncomtextracthomeregionuswest1demo+OR+screenshot+OR+of+OR+interface+OR+showing+OR+uploaded+OR+image+OR+resulting+OR+text+OR+limits+OR+relevant+OR+files+OR+asynchronous+OR+operations+OR+10mb+OR+size+OR+limit+OR+tiff+OR+500mb+OR+3000+OR+pages+OR+maximum+OR+height+OR+width+OR+40+OR+inches+OR+2880+OR+points+OR+cannot+OR+password+OR+protected+OR+contain+OR+2000+OR+formatted+OR+uploading+OR+my+OR+s3credentials+OR+create+OR+credentials+OR+sfmshistory+OR+c+OR+created+OR+user+OR+s3readwritesfmshistory+OR+permissions+OR+boundary+OR+arnawsiamawspolicyamazons3fullaccess+OR+attached+OR+policy+OR+access+OR+key+OR+username+OR+accesskeyid+OR+akiawxfxaiozboqm4xuh+OR+status+OR+active+OR+secretaccesskey+OR+createdate+OR+20220628+OR+1755100000+OR+stored+OR+secret+OR+1password+OR+then+OR+transmit+OR+starting+OR+detection+OR+job+OR+async+OR+mode+OR+where+OR+get+OR+back+OR+id+OR+poll+OR+completion+OR+ask+OR+send+OR+notifications+OR+via+OR+sns+OR+queue+OR+too+OR+optional+OR+ignore+OR+entirely+OR+which+OR+did+OR+start+OR+provide+OR+name+OR+process+OR+import+OR+boto3+OR+boto3clienttextract+OR+response+OR+textractstartdocumenttextdetection+OR+documentlocation+OR+s3object+OR+meetings+OR+minutesminutes1946194919461004sfmsmeetingminutespdf+OR+jobid+OR+responsejobid+OR+polling+OR+that+OR+textractgetdocumenttextdetection+OR+call+OR+returns+OR+jobstatus+OR+inprogress+OR+still+OR+processing+OR+heres+OR+function+OR+wrote+OR+time+OR+def+OR+polluntildonejobid+OR+while+OR+true+OR+textractgetdocumenttextdetectionjobidjobid+OR+responsejobstatus+OR+return+OR+print+OR+end+OR+timesleep10+OR+usage+OR+given+OR+completionresponse+OR+polluntildoneresponsejobid+OR+take+OR+surprisingly+OR+long+OR+took+OR+seven+OR+minutes+OR+6+OR+page+OR+typewritten+OR+me+OR+ten+OR+56+OR+one+OR+was+OR+wondering+OR+how+OR+retrieve+OR+results+OR+getdocumenttextdetection+OR+documentation+OR+says+OR+value+OR+only+OR+valid+OR+7+OR+days+OR+fetching+OR+paginated+OR+gather+OR+blocks+OR+detected+OR+across+OR+multiple+OR+getallblocksjobid+OR+nexttoken+OR+none+OR+false+OR+kwargs+OR+kwargsnexttoken+OR+textractgetdocumenttextdetectionkwargs+OR+blocksextendresponseblocks+OR+responsegetnexttoken+OR+pagination+OR+trick+OR+instead+OR+come+OR+three+OR+types+OR+word+OR+do+OR+not+OR+any+OR+just+OR+indications+OR+lines+OR+words+OR+were+OR+on+OR+duplicate+OR+each+OR+other+OR+probably+OR+example+OR+block+OR+blocktype+OR+confidence+OR+904699478149414+OR+1+OR+geometry+OR+boundingbox+OR+000758015550673008+OR+0011477531865239143+OR+left+OR+09904273152351379+OR+top+OR+000909337680786848+OR+polygon+OR+x+OR+y+OR+09980074763298035+OR+00205709096044302+OR+6b04b8dfbec142d3bfff29f0edd38976+OR+relationships+OR+type+OR+child+OR+ids+OR+58890ca75ed54b14ad60475e5d0dd79e+OR+found+OR+joining+OR+together+OR+those+OR+n+OR+gave+OR+needed+OR+printnjoinblocktext+OR+blockblocktype+OR+truncated+OR+output+OR+organization+OR+meeting+OR+san+OR+francisco+OR+microscopical+OR+society+OR+october+OR+4+OR+1946+OR+wss+OR+held+OR+800+OR+pm+OR+auditorium+OR+department+OR+health+OR+101+OR+grove+OR+street+OR+chairman+OR+george+OR+herbert+OR+needham+OR+called+OR+audience+OR+sixty+OR+five+OR+persons+OR+order+OR+he+OR+told+OR+high+OR+aims+OR+ideals+OR+fine+OR+fellow+OR+ship+OR+enjoyed+OR+by+OR+original+OR+organized+OR+1870+OR+incor+OR+porated+OR+1872+OR+dissolved+OR+following+OR+fire+OR+1906+OR+related+OR+his+OR+efforts+OR+find+OR+surviving+OR+member+OR+finally+OR+resulted+OR+telegram+OR+greeting+OR+dr+OR+kaspar+OR+pischell+OR+ross+OR+cali+OR+fornia+OR+as+OR+follows+OR+best+OR+wishes+OR+reunion+OR+am+OR+sorry%27+order+by+rank+limit+10

simonw commented 2 years ago

Wrote this up as a TIL: https://til.simonwillison.net/sqlite/related-content