problem dumping aggregate

reedstrm commented 6 years ago

I have a custom aggregate for fulltext search vectors, built on top of tsvector_concat defiend as so:

CREATE AGGREGATE tsvector_agg(tsvector) (
    SFUNC = tsvector_concat,
    STYPE = tsvector,
    INITCOND = ''
);

Attempting to dump it w/ dbtoyaml gives this traceback:

$ dbtoyaml pyrseas_test
Traceback (most recent call last):
  File "/home/reedstrm/src/rewrite/bin/dbtoyaml", line 11, in <module>
    load_entry_point('Pyrseas==0.8.dev0', 'console_scripts', 'dbtoyaml')()
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbtoyaml.py", line 49, in main
    dbmap = db.to_map()
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/database.py", line 483, in to_map
    dbmap.update(self.db.schemas.to_map(self.db, opts))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/schema.py", line 342, in to_map
    schemas.update(self[sch].to_map(db, self, opts))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/schema.py", line 121, in to_map
    schobjs.append((obj, obj.to_map(db, no_owner, no_privs)))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/function.py", line 371, in to_map
    dct = self._base_map(db, no_owner, no_privs)
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/__init__.py", line 351, in _base_map
    deps -= self.get_implied_deps(db)
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/function.py", line 404, in get_implied_deps
    deps.add(db.functions[sch, fnc, args])
KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

This is tip-of-master code:

$ dbtoyaml --version
dbtoyaml 0.8.dev0
(master $)$ git describe --tags
v0.7.2-269-g42220a2

reedstrm commented 6 years ago

seems related to #169

jmafc commented 6 years ago

I'm pretty sure it's unrelated to #169. The problem with your aggregate is that it depends on an SFUNC that we don't retrieve because, shall we say, we don't find it to be "of interest" (although we should). If you run the query at https://github.com/perseas/Pyrseas/blob/42220a2c92199fb0391b37b96c26cd11b5bb6627/pyrseas/dbobject/function.py#L126 on psql you'll see it returns no rows (it may return some rows in your db but not the tsvector_concat). If you remove the "nspname != 'pg_catalog' AND " from the WHERE clause then it will show up, but so will 2500 other pg_catalog functions.

I thought that perhaps the work done by @dvarrazzo on dependency tracking would catch issues like this but apparently that's not the case. Whenever someone specializes (in the OO sense) on a pg_catalog object, we're liable to miss some links between objects. In your specific case perhaps we can modify the query above to fetch tsvector_concat (or whichever other pg_catalog function is used as an SFUNC or similar aggregate auxiliary) by modifying the pg_depend subquery in the query above.

dvarrazzo commented 6 years ago

The problem seems to me that pg_aggregate has fields defined as regproc instead of oid.

piro=# \d pg_aggregate
     Table "pg_catalog.pg_aggregate"
      Column      |   Type   | Modifiers 
------------------+----------+-----------
 aggfnoid         | regproc  | not null
 aggkind          | "char"   | not null
 aggnumdirectargs | smallint | not null
 aggtransfn       | regproc  | not null
 aggfinalfn       | regproc  | not null
...

well, not so much of a problem, but an inconsistency: for instance relations are always referred as oid where they could be defined as regclass:

piro=# \d pg_index
        Table "pg_catalog.pg_index"
     Column     |     Type     | Modifiers 
----------------+--------------+-----------
 indexrelid     | oid          | not null
 indrelid       | oid          | not null
...

So when you read pg_index you normally obtain numbers, when you read pg_aggregate you normally obtain strings. Unfortunately these strings are mangled according to the current search_path. See how much it sucks:

piro=# select aggfnoid, aggtransfn from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
   aggfnoid   |   aggtransfn    
--------------+-----------------
 tsvector_agg | tsvector_concat
(1 row)

piro=# set search_path to piro;
SET

piro=# select aggfnoid, aggtransfn from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
      aggfnoid       |   aggtransfn    
---------------------+-----------------
 public.tsvector_agg | tsvector_concat
(1 row)

I remember working in pyrseas that disambiguating objects in different schemas was already tricky, and disambiguating functions was even more because you can have the same func with same name and schema but different signature.

I think the solution for aggregates is to use numeric oids (casting the regproc fields to oid) instead of what returned natively querying pg_aggregate. Namespaces and signatures stop becoming issues:

piro=# select aggfnoid::oid, aggtransfn::oid from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
 aggfnoid | aggtransfn 
----------+------------
  1145986 |       3625
(1 row)

piro=# select nspname, proname from pg_proc p join pg_namespace n on p.pronamespace = n.oid where  p.oid in (1145986,3625);
  nspname   |     proname     
------------+-----------------
 pg_catalog | tsvector_concat
 public     | tsvector_agg

Can't remember if there is support in pyrseas to refer to functions or other objects by oid?

Parsing nulls also sucks big time...

https://github.com/perseas/Pyrseas/blob/42220a2c92199fb0391b37b96c26cd11b5bb6627/pyrseas/dbobject/function.py#L389

That's not even null, it's 0::oid. Why, what they were even thinking? :\

dvarrazzo commented 6 years ago

Whenever someone specializes (in the OO sense) on a pg_catalog object, we're liable to miss some links between objects.

In this case this is not the problem, Aggregate.get_implied_deps() seems to get it right (although looking at the number of fields in pg_attribute we may be missing others?). The problem is the ambiguity between public and pg_catalog in the representation of the function as a regproc, with the default search_path. If we solved the function right we would notice it is a builtin and not consider it a dependency.

https://github.com/perseas/Pyrseas/blob/42220a2c92199fb0391b37b96c26cd11b5bb6627/pyrseas/dbobject/function.py#L577-L598

(see also how much it sucks needing to generate a signature to solve the depending functions whereas using the oid it would not be a concern...)

reedstrm commented 6 years ago

Yup, I just noticed when poking at this in pdb that the schema assigned is wrong - public rather than pg_catalog , because self.sfunc is the string tsvector_concat, with no schema qual. Ah, I see, that's the default return string conversion for regproc. So the magic regproc type is lossy when using its string output function. It works properly if you join on it, though, or cast to oid

jmafc commented 6 years ago

Daniele, with all due respect but I think you're chasing the rabbit down the wrong hole. At the point where Ross gets the error, we're trying to find the function tsvector_concat in db.functions but if you run the CREATE AGGREGATE on an empty database and then run dbtoyaml, at that point db.functions is empty and why is it empty? Because the query that fetches functions doesn't fetch functions defined in the pg_catalog schema.

Re: the '-' in parsing regprocs (and other things) is because PG catalogs define most attributes as NOT NULL (very appropriate IMHO) so you can't "store NULLs" in them (in quotes because it's illogical to store nothing).

The problem with disambiguating schemas is due to a rather unfortunate initial design decision: doing a set search_path to public, pg_catalog right after we connect. At one point I considered changing that to include onlypg_catalog but it would be a major refactoring.

OIDs can't be used because we have to produce an external (YAML) dump and accept that back as input. See #119 for additional details. I hope we can eventually get rid of internal use of OIDs and rely solely on external object keys.

dvarrazzo commented 6 years ago

Daniele, with all due respect but I think you're chasing the rabbit down the wrong hole. At the point where Ross gets the error, we're trying to find the function tsvector_concat in db.functions but if you run the CREATE AGGREGATE on an empty database and then run dbtoyaml, at that point db.functions is empty and why is it empty? Because the query that fetches functions doesn't fetch functions defined in the pg_catalog schema.

Yes but the error is:

KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

regardless of why is wrong, this looks wrong to me:

piro=# \sf tsvector_concat 
CREATE OR REPLACE FUNCTION pg_catalog.tsvector_concat(tsvector, tsvector)

Re: the '-' in parsing regprocs (and other things) is because PG catalogs define most attributes as NOT NULL (very appropriate IMHO) so you can't "store NULLs" in them (in quotes because it's illogical to store nothing).

I would find more appropriate to define a lack of existence with a NULL instead of with InvalidOid. However this is an opinion and not material for this bug.

The problem with disambiguating schemas is due to a rather unfortunate initial design decision: doing a set search_path to public, pg_catalog right after we connect. At one point I considered changing that to include onlypg_catalog but it would be a major refactoring.

If you go down this road I suggest instead to drop everything from search_path and not special-case public: it would make your life way easier.

OIDs can't be used because we have to produce an external (YAML) dump and accept that back as input. See #119 for additional details. I hope we can eventually get rid of internal use of OIDs and rely solely on external object keys.

I don't say to store the oids, I know it is not possible. This is only to create dependency edges in the graph path. It would be way easier if oids were used everywhere - still IMO and still not necessary to fix this bug.

jmafc commented 6 years ago

Yes but the error is:

KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

regardless of why is wrong, this looks wrong to me:

piro=# \sf tsvector_concat CREATE OR REPLACE FUNCTION pg_catalog.tsvector_concat(tsvector, tsvector)

I'm not sure what looks wrong to you: something about the way psql displays the function? the fact that we're searching for public.tsvector_concat instead of pg_catalog.tsvector_concat? If it's the latter, it's because a few lines above we called split_schema_obj(self.sfunc). Even if we had supplied self.schema as a second argument to that call, it would not have made a difference. If what looks wrong to you is that the query result is tsvector_concat instead of pg_catalog.tsvector_concat that's an issue for PGDG. We'll just have to assume that, for the time being, unqualified functions (or types like tsvector) are in pg_catalog (or public until we change the search_path).

If you go down this road I suggest instead to drop everything from search_path and not special-case public: it would make your life way easier.

Special-casing public is definitely desirable, but perhaps keeping pg_catalog would make sense since then something like tsvector_concat can be assumed to be pg_catalog.tsvector_concat.

jmafc commented 6 years ago

pg_depend has no row for tsvector_concat after the aggregate is created (it only has PIN dependency). So fetching from pg_depend (or modifying existing pg_depend queries) will not help to solve this issue.

jmafc commented 6 years ago

@reedstrm Please take a look at https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/ . Solving this issue is dependent on issue #185 and I have no estimate of when that will worked on.

perseas / Pyrseas

problem dumping aggregate #175