pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
237 stars 34 forks source link

Expose ogr_fdw_info functions via SQL #260

Open rcoup opened 1 month ago

rcoup commented 1 month ago

I've found it useful to be able to introspect OGR datasources as the PostgreSQL server can see it — SQL equivalents of ogr_fdw_info commands.

  1. Add a function to list available OGR layers from an FDW server:

    SELECT ogr_fdw_layers('myserver');
     ogr_fdw_layers
    ----------------
     Cities
     Countries
    
    (2 rows)
  2. Add a function to get the CREATE FOREIGN TABLE SQL for a particular OGR layer, the same as IMPORT FOREIGN SCHEMA uses. This can help for issues where OGR is reporting column types wrongly, or some columns aren't needed but it's mostly correct.

    ogr_fdw_table_sql(server_name, ogr_layer_name, table_name=NULL, launder_column_names=TRUE, launder_table_name=TRUE)

    SELECT ogr_fdw_table_sql('myserver', 'pt_two');
            ogr_fdw_table_sql
    ---------------------------------
    CREATE FOREIGN TABLE "pt_two" (
      fid integer,
      "geom" geometry(Point, 4326),
      "name" varchar,
      "age" integer,
      "height" real,
      "birthdate" date
    ) SERVER "myserver"
    OPTIONS (layer 'pt_two');

Not sure on preferences whether you'd prefer the functions to take OIDs (eg: SELECT ogr_fdw_layers(myserver);) instead of names; and suggestions also very welcome on improvements for naming or docs.

pramsey commented 1 month ago

I think this is useful but I'm +0 on the API.

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

rcoup commented 1 month ago

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

Yes, so you don't need ogr-fdw built+installed on both the client and the PG server. And also to fix random schema issues by getting the SQL since ogr_fdw_info doesn't support launder options.

I think this is useful but I'm +0 on the API.

👍 Happy to iterate.

  • Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.

Yeah, that's what I was leaning towards.

  • Need some way to play with connection strings to find one that works.
  • How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?

I feel like it might get confusingly overloaded?

ogr_fdw_info(datasource text) returns text; -- CREATE SERVER... 
ogr_fdw_info(server oid, layer text) returns text; -- CREATE FOREIGN TABLE...

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

pramsey commented 1 month ago

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

Yeah, I hadn't thought about the extent to which the FDW API was providing some free security cover. I mean, we could just make the ogr_read be a superuser-only function, people can grant security definer on it if they want to swim naked. (BTW, you don't have to write ogr_read, that's on my personal todo/wish list.

rcoup commented 1 month ago
ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

I kinda assumed it'd be the difference between:

SELECT ogr_fdw_info("myserver"); -- oid
SELECT ogr_fdw_info(myserver); -- oid
SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- text

But maybe there's some special TIL typing coercion that happens by default?

I mean, SELECT lower('foo') and SELECT lower("foo")/lower(foo) are fundamentally different, that's SQL.101?

(I'll accept "it's still too confusing, don't do it" 😁 )

I mean, we could just make the ogr_read be a superuser-only function

ok. Maybe ogr_fdw_info() & ogr_fdw_layers() too? They're still doing remote network requests.

pramsey commented 1 month ago

(I'll accept "it's still too confusing, don't do it" 😁 )

Nope I just have my head up my ass. Ignore me and procede.

robe2 commented 2 weeks ago

How about:

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

What is the expected output of ogr_fdw_info_layers? If it's simply a set of layer names, I think output being a

SETOF text

makes more sense.

rcoup commented 2 weeks ago

I've been somewhat bogged down in other stuff, but I'll get back to finishing this soon hopefully :-)

If it's simply a set of layer names, I think output being a SETOF text makes more sense.

TIL SETOF text for returning multiple rows of a single column; my previous understanding was you needed SETOF record for anything returning multiple rows.

So, yes it does 👍

robe2 commented 11 hours ago

SETOF text or SETOF anytype will return multiple records and will just have the function as the name of column if no alias is provided.

The main difference is SETOF record or RETURNS TABLE would allow multiple columns per row but you have to deal with that messy business of using RETURNS TABLE (defining the columns) or using OUT params, which is kinda pointless if you are only returning one column.

Take for example the postgis function ST_SubDivide, it returns a SETOF geometry and you use it like any other table function.

CREATE OR REPLACE FUNCTION st_subdivide(
    geom geometry,
    maxvertices integer DEFAULT 256,
    gridsize double precision DEFAULT '-1.0'::numeric)
    RETURNS SETOF geometry 
    LANGUAGE 'c'
    COST 5000
    IMMUTABLE STRICT PARALLEL SAFE 
    ROWS 1000

AS '$libdir/postgis-3', 'ST_Subdivide'
;

But that said, I don't think there is any performance penalty doing it your way, just that you have a slightly longer definition.