Why not use `@tanstack/query` for the react hooks?

Sheraff commented 7 months ago

I'm working on making a "starter template" for offline-first PWAs and I was wondering if there was a reason to have gone full-custom for the query hooks? The @tanstack/query library is pretty mature and nice to use.

If I were to give the sales pitch, here's what I could say

it's framework-agnostic™ (I never tried anything other than the react implementation)
it's well typed
it's well known, people are likely to be already familiar with it
it provides a global cache manager for de-duping simultaneous queries / invalidating
it does a deep merge of its results to minimize re-renders
it provides a couple utilities (focus manager, online manager) that can be useful for "offline-conscious" apps
it has hooks more patterns (mutations, infinite / paginated queries, react suspense)
it has devtools
it has good documentation

On the other hand

it's quite a good amount of code (13.7kB gzipped), though for an app that already loads 2Mb of wasm it should be ok
I'm not exactly sure if it's as fast as your current implementation

I tried a quick implementation of useQuery, here's what that could look like:

import { useQuery, useQueryClient, hashKey } from "@tanstack/react-query"
import { useDB, type CtxAsync } from "@vlcn.io/react"
import { useEffect, useState, useRef } from "react"

type DBAsync = CtxAsync["db"]

type UpdateType =
    /** INSERT */
    | 18
    /** UPDATE */
    | 23
    /** DELETE */
    | 9

/**
 * Not really useful, this is just to increase the cache hit rate.
 */
function sqlToKey(sql: string) {
    return sql.replace(/\s+/g, " ")
}

/**
 * Rely on react-query's cacheManager to
 * - know which queries are active
 * - force invalidation of "currently in-use" queries
 *
 * Rely on vlcn RX to
 * - know which tables are used by a query
 * - know when to invalidate queries
 */
export function useCacheManager(dbName: string) {
    const [tables, setTables] = useState<string[]>([])
    const queryMap = useRef(
        new Map<
            string,
            {
                tables: string[]
                updateTypes: UpdateType[]
                queryKey: unknown[]
            }
        >(),
    )

    const client = useQueryClient()
    const ctx = useDB(dbName)

    useEffect(() => {
        /** not the cleanest implementation, could execute less code if it was outside of react */
        const cleanup = tables.map((table) =>
            ctx.rx.onRange([table], (updates) => {
                queryMap.current.forEach((query) => {
                    if (!query.tables.includes(table)) return
                    if (!updates.some((u) => query.updateTypes.includes(u))) return
                    client.invalidateQueries({ queryKey: query.queryKey, exact: true })
                })
            }),
        )
        return () => {
            for (const dispose of cleanup) {
                dispose()
            }
        }
    }, [tables])

    useEffect(() => {
        const cacheManager = client.getQueryCache()
        /** count how many queries are relying on each table */
        const tableCountMap = new Map<string, number>()

        const cacheUnsubscribe = cacheManager.subscribe((event) => {
            if (event.type === "observerAdded") {
                const key = hashKey(event.query.queryKey)
                if (queryMap.current.has(key)) return
                /** add to Map early, so we know if it has been deleted by `observerRemoved` before `usedTables` resolves */
                queryMap.current.set(key, {
                    updateTypes: event.query.queryKey[2],
                    queryKey: event.query.queryKey,
                    tables: [],
                })
                usedTables(ctx.db, event.query.queryKey[1]).then((usedTables) => {
                    const query = queryMap.current.get(key)
                    if (!query) return
                    queryMap.current.set(key, { ...query, tables: usedTables })
                    for (const table of usedTables) {
                        if (!tableCountMap.has(table)) {
                            tableCountMap.set(table, 1)
                            setTables((tables) => [...tables, table])
                        } else {
                            tableCountMap.set(table, tableCountMap.get(table)! + 1)
                        }
                    }
                })
            } else if (event.type === "observerRemoved") {
                const key = hashKey(event.query.queryKey)
                const query = queryMap.current.get(key)
                if (!query) return
                queryMap.current.delete(key)
                for (const table of query.tables) {
                    if (!tableCountMap.has(table)) continue
                    const count = tableCountMap.get(table)!
                    if (count === 1) {
                        tableCountMap.delete(table)
                        setTables((tables) => tables.filter((t) => t !== table))
                    } else {
                        tableCountMap.set(table, count - 1)
                    }
                }
            }
        })
        return cacheUnsubscribe
    }, [])
}

let queryId = 0

export function useDbQuery<
    TQueryFnData = unknown,
    // TError = DefaultError, // TODO
    TData = TQueryFnData[],
>({
    dbName,
    query,
    select,
    bindings = [],
    updateTypes = [18, 23, 9],
}: {
    dbName: string
    query: string
    select?: (data: TQueryFnData[]) => TData
    bindings?: ReadonlyArray<string>
    updateTypes?: Array<UpdateType>
}) {
    const ctx = useDB(dbName)
    const queryKey = [dbName, sqlToKey(query), updateTypes, bindings]

    return useQuery({
        queryKey,
        queryFn: async () => {
            const statement = await ctx.db.prepare(query)
            statement.bind(bindings)
            const [releaser, transaction] = await ctx.db.imperativeTx()
            const transactionId = queryId++
            transaction.exec(/*sql*/ `SAVEPOINT use_query_${transactionId};`)
            statement.raw(false)
            try {
                const data = (await statement.all(transaction)) as TQueryFnData[]
                transaction.exec(/*sql*/ `RELEASE use_query_${transactionId};`).then(releaser, releaser)
                return data
            } catch (e) {
                transaction.exec(/*sql*/ `ROLLBACK TO use_query_${transactionId};`).then(releaser, releaser)
                throw e
            }
        },
        select,
    })
}

async function usedTables(db: DBAsync, query: string): Promise<string[]> {
    const sanitized = query.replaceAll("'", "''")
    const rows = await db.execA(/*sql*/ `
        SELECT tbl_name
        FROM tables_used('${sanitized}') AS u
        JOIN sqlite_master ON sqlite_master.name = u.name
        WHERE u.schema = 'main';
    `)
    return rows.map((r) => r[0])
}

If you're curious, here's the "starter template" I'm talking about https://github.com/Sheraff/root

tantaman commented 7 months ago

I had thought about it but never took the time to look into it.

The one thing I was worried about was that it looks like @tanstack/query requires manual invalidations of queries.

Looks like you solved that problem in useCacheManager though :)

Some ways you could improve your current implementation:

db.prepare returns a prepared statement that you must finalize or else you'll leak memory. I don't see where you finalize it currently.
db.prepare is actually somewhat expensive and can take 50-75% of a query's total time. You should re-use prepared statements. Your queryFn re-prepares them each time.
usedTables exists as a prepared statement on the db class now -- https://github.com/vlcn-io/js/blob/main/packages/crsqlite-wasm/src/DB.ts#L67-L73 so you can just grab a reference to that rather than preparing it youself. It is a rather expensive query in terms of time it takes to prepare. Quick to run after that.

Something that might make all this easier for you is that I've been looking into automatically caching prepared statements in the DB class itself so users don't have to worry about it. This is a common thing done in many language bindings for SQLite.

it's quite a good amount of code (13.7kB gzipped), though for an app that already loads 2Mb of wasm it should be ok

WASM size and JS size are not exactly equivalent. WASM, since it is already byte code and supports streaming compilation, can actually start executing as soon as the final byte finishes downloading. JS, on the other hand, takes quite a while for low-end devices to parse, compile and start.

E.g., 1MB of JS can take 2 seconds to compile and start after it is downloaded. 1MB of WASM starts as soon as download completes.

Old but useful article: https://medium.com/reloading/javascript-start-up-performance-69200f43b201

I wonder if @tanstack/query is actually that large after tree-shaking?

Sheraff commented 7 months ago

Thanks for your response

db.prepare returns a prepared statement that you must finalize or else you'll leak memory.

good catch, will fix this

You should re-use prepared statements. Your queryFn re-prepares them each time.

the queryFn is only re-executed if the query needs to re-run, so at least it's not on every render. But fair point. An in-memory cache would be quite easy to make, and maybe I can listen to react-query's cache manager to purge the statement when the data itself is purged from the cache (this duration is configurable when instantiating react-query, and it's only purged if the query isn't currently in use).

usedTables exists as a prepared statement on the db class now

Yoooo that's dope! I'm still on v0.15 for now, wasn't sure whether 0.16 was stable.

I wonder if @tanstack/query is actually that large after tree-shaking?

I'm not sure I'm testing this right, but I just tried running a Vite build on my repo (the only thing I do with react-query is what is pasted above) with the rollup-plugin-visualizer plugin, and the output is about the same as what is reported on bundlephobia. So that "13.7kB gzipped" is probably about right.

tantaman commented 7 months ago

The SQLite team is considering pushing the statement cache down into SQLite itself -- https://sqlite.org/forum/forumpost/d5939a8557a7a85f

I really hope they do this.

wasn't sure whether 0.16 was stable.

it is as of 0.16.2-next. I need to update some docs before making it the official release.

Notable changes are:

Primary key columns must be tagged as NOT NULL
site_id is now required so fetching local changes requires a comparison on site_id != crsql_site_id() rather than site_id IS NOT NULL

Sheraff commented 6 months ago

Hey I was trying to make a more production-ready version of this "react-query adapter" above, and I had a question : is there a cost to keep listening for changes (rx.onRange([table], ...) on a table that is not changing? Or is it just the slight memory overhead (listener closure) + an entry in a hashmap somewhere (listener registry)?

tantaman commented 6 months ago

Minimal overhead. Just the extra memory as you mention.

If all listeners were removed entirely there's potentially a speedup since we can reduce the number of times we have to cross the WASM <-> JS bridge which can be expensive.

To that end, it'd be an improvement if we gathered updates in WASM and only called into JS once on commit of the transaction. Related: https://github.com/vlcn-io/js/issues/48

kimmobrunfeldt commented 2 months ago

@Sheraff I was setting up dependencies for a project and also ended up with using @tanstack/react-query setup.

I didn't yet understand why the hook needs to be so complex? I'm quite sure I'm missing something. For example the comment lists how to know which queries are active and how to invalidate those, but why is this required?

/**
 * Rely on react-query's cacheManager to
 * - know which queries are active
 * - force invalidation of "currently in-use" queries
 *
 * Rely on vlcn RX to
 * - know which tables are used by a query
 * - know when to invalidate queries
 */

My first (probably naive) thought was to simply do:

import { useQuery } from '@tanstack/react-query'
import { useDB } from '@vlcn.io/react'
import sql from 'sql-template-tag'

export function Component() {
  const ctx = useDB('dbName')

  const threshold = 10
  const sqlQuery = sql`SELECT * FROM my_table WHERE some_value > ${threshold}`
  const {
    data: rows,
    isLoading,
    isError,
  } = useQuery({
    queryKey: ['sqlQuery', sqlQuery.sql, sqlQuery.values],
    queryFn: async () => {
      // Not sure how to represent ctx as a dependency for the query
      const rows = await ctx.db.execO<{ id: string; name: string }>(
        sqlQuery.sql,
        sqlQuery.values
      )
      return rows
    },
  })

  if (isLoading) return <div>Loading...</div>
  if (isError) return <div>Error</div>

  if (!rows) throw new Error('No data') // should not happen
  return (
    <ul>
      {rows.map((row) => (
        <li key={row.id}>{row.name}</li>
      ))}
    </ul>
  )
}

Sheraff commented 2 months ago

@kimmobrunfeldt 99% of the complexity is for reactive queries. What I wanted to make (and what vlcn.io/react provides out of the box) is queries that will re-render the component when their value changes — either from a local mutation, or from a mutation that happened on a remote copy of the DB and that was synced through the CRDT mechanism.

What you wrote here is completely fine, but there is no reactivity, and if you want the query to re-execute and re-render, you'll have to explicitly call queryClient.incalidateQueries({ queryKey: ['sqlQuery', sqlQuery.sql, sqlQuery.values] }) with the proper SQL query, at the proper time.

Also, even though it's not really the subject, to answer your remark "Not sure how to represent ctx as a dependency for the query": you can add ctx.db.db to your query key, AFAIK it's a unique number representing your DB.

kimmobrunfeldt commented 2 months ago

@Sheraff thanks for the quick reply. Got it, now everything makes sense :+1:

Also, even though it's not really the subject, to answer your remark "Not sure how to represent ctx as a dependency for the query": you can add ctx.db.db to your query key, AFAIK it's a unique number representing your DB.

👏 thank you!

kimmobrunfeldt commented 2 months ago

By ctx.db.db I assume you meant ctx.db.siteid

Sheraff commented 2 months ago

By ctx.db.db I assume you meant ctx.db.siteid

I meant ctx.db.db but maybe siteid does the job just as well.

kimmobrunfeldt commented 2 months ago

Ok right, the .db wasn't exposed via types

vlcn-io / js

Why not use `@tanstack/query` for the react hooks? #46