ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
715 stars 103 forks source link

pg_hint_plan and plugin_debugger don't work together #116

Closed phlaluna closed 1 year ago

phlaluna commented 1 year ago

Hi,

I did several tests and the Debbuger Plugin does not work when the shared_preload_libraries has "plugin_debugger,pg_hint_plan". When shared_preload_libraries has only "plugin_debugger" or with other libraries like "auto_explain,plugin_debugger,pg_stat_statements,pg_repack", it's work.

I can create function and create extension without any error. But when I use Debug from PGAdmin or Dbeaver it doesn't return the debug result.

Works: shared_preload_libraries="auto_explain,plugin_debugger,pg_stat_statements,pg_repack" return-debug-without-pg-hint

Problem: shared_preload_libraries="plugin_debugger,pg_hint_plan" pgadmin-with-pg-hint-no-parameters pgadmin-with-pg-hint-no-result pgadmin-with-pg-hint-no-stack

Script to create test function:

CREATE SCHEMA test;
DROP function if exists test.somefunc(var integer);
CREATE FUNCTION test.somefunc(var integer) RETURNS integer AS $$
DECLARE
   quantity integer := 30+var;
BEGIN
   RAISE NOTICE 'Quantity here is %', quantity;      --30
   quantity := 50;
   --
   --
   DECLARE
      quantity integer := 80;
   BEGIN
      RAISE NOTICE 'Quantity here is %', quantity;   --80
   END;
   RAISE NOTICE 'Quantity here is %', quantity;      --50
   RETURN quantity;
END;
$$ LANGUAGE plpgsql;
CREATE EXTENSION pldbgapi;

I opened an issue on postgresql and they replied the plugin debugger uses dbg API, pg_hint_plan uses dbg API too. Unfortunately this API is not designed to be used by two active extensions in one time.

I would like to leave an issue to make it possible to use both without problems.

rjuju commented 1 year ago

As a reference, a similar issue was fixed in plpgsql_check with this commit (which seems straightforward)

michaelpq commented 1 year ago

As a reference, a similar issue was fixed in plpgsql_check with this commit (which seems straightforward)

Yes, I have been thinking about this one a bit, and using a TRY/CATCH block to make sure that we correctly trigger the previous callbacks on the stack may be the best way forward, even if it complicates slightly the stack. @rjuju Do you think that this is something we should backpatch? My feeling would be yes, though I'd welcome extra opinions.

michaelpq commented 1 year ago

By the way, another way to address this issue would be to backpatch 7c6d950 to the older stable branches so as we don't use anymore the plpgsql callbacks to track the level of nesting. Any thoughts or opinions about that?

michaelpq commented 1 year ago

By the way, another way to address this issue would be to backpatch 7c6d950 to the older stable branches so as we don't use anymore the plpgsql callbacks to track the level of nesting. Any thoughts or opinions about that?

@MasahikoSawada @MasaoFujii @yamatattsu @rjuju @horiguti , any thoughts?

MasaoFujii commented 1 year ago

@michaelpq

any thoughts?

I'm not very familiar with this issue, but as a user of pg_hint_plan, I am concerned about any changes that might affect the behavior of the old stable version. I would like to avoid any unexpected disruptions to production systems that are currently working well.

That said, if this change is necessary to fix the bug, I suggest adding a new custom GUC parameter such as pg_hint_plan.enable_plpgsql_nesting_level. The default value should be set to "on", and this parameter should not change the behavior of pg_hint_plan. Instead, it would prevent pg_hint_plan and plugin_debugger from working together. If users want to use them together, they can simply disable this GUC parameter. This way, we can avoid changing the behavior of pg_hint_plan and still address the issue at hand.

michaelpq commented 1 year ago

I'm not very familiar with this issue, but as a user of pg_hint_plan, I am concerned about any changes that might affect the behavior of the old stable version. I would like to avoid any unexpected disruptions to production systems that are currently working well.

Well, the internal counter in charge of tracking the level of nesting for plpgsql calls is already broken, so we need to do something about that in all stable branches anyway . For example, just use a mix of ROLLBACK and/or COMMIT in internal functions and it is possible to get a hint related to an entirely different query than the one expected, with the hint coming from either one of the internal functions or just the outer-most query. So that's already quite broken.

That said, if this change is necessary to fix the bug, I suggest adding a new custom GUC parameter such as pg_hint_plan.enable_plpgsql_nesting_level. The default value should be set to "on", and this parameter should not change the behavior of pg_hint_plan. Instead, it would prevent pg_hint_plan and plugin_debugger from working together. If users want to use them together, they can simply disable this GUC parameter. This way, we can avoid changing the behavior of pg_hint_plan and still address the issue at hand.

Adding an extra GUC does not really sound wise to me when it comes to ignore or not ignore other module's callbacks, especially knowing that it would be a switch for users to turn on or off what's actually a bug. This just adds technical debt because we would now need to keep maintaining a broken behavior on top of what would be correct ad vitam eternam.

michaelpq commented 1 year ago

By the way, another way to address this issue would be to backpatch 7c6d950 to the older stable branches so as we don't use anymore the plpgsql callbacks to track the level of nesting. Any thoughts or opinions about that?

So, coming back to this one, we don't have a clear consensus about what to do in ~15, and I certainly don't find acceptable the approach of using more GUCs to hide or just enforce broken behaviors. On PG16 and master, the removal of the plpgsql plugin functions guarantees that pg_hint_plan used in combination with anything that wants to recurse into the other hooks like pldebugger would work.

Perhaps this is enough in the long term anyway, as per my argument of upthread that the recursion level calculation could get incorrect. It would be simple to get a fix done in ~15 by making sure that the recursion happens, but I also get the point of Fujii-san that this may introduce new behaviors in older branches that some may not want.

For all these reasons, I am just closing this issue.