pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.64k stars 248 forks source link

`cargo pgrx schema` output isn't repeatable #1866

Closed eeeebbbbrrrr closed 1 month ago

eeeebbbbrrrr commented 1 month ago

On a large extension (ParadeDB's pg_search extension), I've found that consecutive runs of cargo pgrx schema, with no code changes, produce differently-ordered schemas. The graph relationship is still correct, but the edges don't seem to emit in the same order every time.

(I've made sure the few extension_sql!() blocks pg_search has all have proper requires = [...] arrays, just to make sure the graph gods aren't making different decisions for those, which might then upset the whole graph.)

I swear at one point in history the schema generator did produce consistent results.

I've been looking through the code, along with diffs in the commit history, and I don't see anything that would introduce randomness. We make sure to sort the list of sql entities and process them in order when building the graph, and we do a toposort to walk the graph to generate the sql. It's at that point the nodes come out in different orders each time.

@workingjubilee you got any ideas? I'm happy to fix this (assuming we can?), but I'm kinda stumped on where else to look.

Ultimately, I'd like to be able to do:

$ diff old-schema.sql new-schema.sql > diff.sql

And have a reasonable diff generated, or no diff at all if in fact there were no changes to the schema structure between "old" and "new".

workingjubilee commented 1 month ago

Just for reference, what I mentioned in discussion: "toposort" does not by itself imply consistency in ordering, because a topological sort can produce many different orderings that are still topologically ordered.

eeeebbbbrrrr commented 1 month ago

The switch to tarjan_scc I did in #1867 appears to emit the "strongly connected components" (as it calls them) in a consistent order since it's DFS. And within each block of those, the ordering appears to be that of the graph insertion order (or Ord/Eq properties, I can't actually tell).

I understand there's no prescribed order per se, I'm just looking for a consistent ordering from run to run. Seems like #1867 does that.

workingjubilee commented 1 month ago

I think the tarjan_scc ordering still can also vary in theory, it just in-practice won't because its additional ordering constraints lock it down enough. This decision can be revisited again if this becomes a practical concern.

eeeebbbbrrrr commented 1 month ago

Yeah that’s the part that’s not super clear to me. Wikipedia says traversal can start at any arbitrary node but it appears the petgraph impl chooses the same one each time.

I don’t know if that’s the root node or what. So we probably are a bit dependent on implementation details, but this appears to be exactly what I was after — for now.