pgsql-io / multicorn2

http://multicorn2.org
Other
83 stars 17 forks source link

PG14: UPDATE acts inconsistently between execute() & update() invocations #48

Closed mfenniak closed 4 months ago

mfenniak commented 4 months ago

One of the test suite cases in multicorn2 attempts to perform an update using the filesystem FDW:

https://github.com/pgsql-io/multicorn2/blob/ea5b3f0bd750c3446dfadccfa9c952da0960d4f6/test-common/multicorn_testfilesystem.include#L170

Under PG13, that results in the following interactions with the FDW:

.execute([size = large, color = cyan], {'size', 'color', 'filename', 'ext', 'name', 'data'})
.update(
    'cyan/large/triangle.jpg',
    {
        'color': 'magenta',
        'size': 'large',
        'name': 'triangle',
        'ext': 'jpg',
        'filename': 'cyan/large/triangle.jpg',
        'data': 'Im a triangle'
    })

Note that the data field is requested in the execute method, and then it is provided to the update method, even though the UPDATE statement never touched the data field. This is somewhat inefficient -- the operation required reading all fields of the FDW table, and then provided all fields of the table to the modification method.

Under PG14, that results in the following interactions with the FDW:

.execute([size = large, color = cyan], {'filename', 'color', 'size'})
.update(
    'cyan/large/triangle.jpg',
    {
        'color': 'magenta',
        'size': 'large',
        'name': 'triangle',
        'ext': 'jpg',
        'filename': 'cyan/large/triangle.jpg',
        'data': None
    })

(note that the FS FDW does not return "data" from execute() if it wasn't requested in the column list -- which is different from "ext" which is always returned by this FDW -- and that behavior, which is totally sane, triggers this bug)

On the plus side, PG14 has only requested the fields that are required from the execute method -- size and color because of the where clause, and filename because of the returning clause (this may be absent if the patch #47 isn't applied; my work is tangled together a little). However, the problem is that provided values that weren't being modified, and weren't queried, to update -- including setting "data" to None. The FS FDW won't query the contents of a file unless that specific column is requested in execute, and that column wasn't requested, so it wasn't queried, and then multicorn provided None to the update.

I think there are two pathways to solving this...

1) Make multicorn do the same thing in PG14 that it did in PG13; query all the columns on an UPDATE, and provide all the columns to the update method. This has the benefit of being backwards compatible for all the existing FDW providers. But the downside of being inefficient.

2) Make multicorn provide a limited set of columns to the update method, indicating only the columns that need to be changed, and allowing the query to only return the identifying row data, the data required to meet quals, and the data required to meet returning. This seems very likely to be a breaking change in multicorn's API.

Or, implement both option 1 and 2, and allow a configuration setting on the server to switch between a "legacy" mode and an "efficient" mode.

This PR implements the first option, as it is easy and backwards compatible. Combined with #47, this change passes the entire test suite under PG14 (w/ one adjusted test output file included). The second option could be taken up as enhancement work in the future to improve efficiency, with a careful mind to the backwards compatibility implications.