Open jcflack opened 2 years ago
Hi @jcflack,
I found the "PL/Java refactoring for support of other JVM and polyglot languages" project while browsing PostgreSQL GSoC projects for this year, and I think it sounds really interesting. I wanted to check and see if PL/Java is still under active development - the last commit was pushed last year - and if the GSoC project would still be relevant.
Thanks!
Hello @jcflack, I found this while going through Postgres' GSoC 2023 projects list. I consider myself adept with Java 11+ and a bit of Scala 2.12. Though I haven't really used Postgres in professional capacity or otherwise, I'm interested in this project as I'm primarily a Systems person at heart with an inclination towards Compilers, Runtimes, PL Theory etc. I am not familiar with JNI as such and need to brush up on C, but would first start with building the project and going through the codebase. Any suggestions from your end? Looking forward towards contributing something tangible!!
Warm Regards, Divyaank
Hello @HoussemNasri and @divyaankt,
Thank you for your interest in the project! Yes, the GSoC project is still relevant. It didn't attract any applicants last year, and so I used the time to step a bit away from PL/Java and focus on some other things for the summer, which turned into the summer and fall (and part of winter). But I am more than happy to return to PL/Java.
What a wonderful piece of technical literature.
I'm wondering if a tool that compiles non-Java JVM code to Java (under the hood) and then uses PL/SQL could be a candidate solution since I am guessing this doesn't disrupt our JBDC centrality.
I have a bunch of other questions, but I'm just trying to grok the current sentiment going forward so I can work towards it.
The SlotTester.test
method that was hastily provided earlier, as a way of testing this new API, was rather limiting. It could only accept a String
query with no parameters, causing flashbacks to the bad old days of SQL thrown together with string concatenation.
In the current state of this branch, the new API is incomplete and read-only, and the old legacy JDBC implementation is still around, so the obvious interim solution is to bridge the two, allowing JDBC Statement
or PreparedStatement
for issuing a query, and now SlotTester.unwrapAsPortal
to present the JDBC ResultSet
as a Portal
object, and proceed to retrieve the results using new API.
The basic method for fetching a value from a
TupleTableSlot
isget(Attribute att, Adapter adp)
, and naturally is overloaded and generic so thatget
with anAs<T,?>
adapter returns aT
,get
with anAsInt<?>
adapter returns anint
, and so on.
That early approach in the new API, for the purpose of early signs of life, was JDBC-like in requiring a method call for each item to retrieve. Beyond the tedium of developing code in that style, it also forecloses many opportunities for efficient implementation, requiring any needed checks on user input (does this attribute belong to this result? is this adapter for a suitable type? native memory region still valid?, and so on) to be repeated for every single-item fetch. It may also be advantageous to arrange the order of fetches with some knowledge of how the tuple deforming is done, for which, again, the implementation has no flexibility with the user code picking at it one item at a time.
The idea was always to supply a multiple-columns-at-once API, which is introduced here.
TargetList
, and its subinterface Projection
, are used for selecting the attributes of interest. Projection
, as in the algebraic usage, does not allow more than one mention of any attribute. The original Projection
is the full TupleDescriptor
for a result, and another Projection
derived from an existing one must have a subset of its attributes, possibly reordered.
TargetList
, the superinterface, relaxes the nonrepetition condition; a TargetList
is allowed to mention the same attribute more than once. That should sound less efficient than simply mentioning it once and letting the user Java code copy the fetched value around, but there may be cases where it is useful. One would be when the Java code wants different representations of one PG value, produced by different Adapter
s. Another can be when the Java representation is a one-use-only class like SQLXML
.
After shaping a Projection
or TargetList
to suit just what the Java code wants to retrieve, the TargetList
can be applied over a List
of TupleTableSlot
at a time, using adapters selected for the desired Java types, and a lambda with corresponding parameters, whose types are inferred from the adapters. Functional interfaces of some likely lengths are provided, and can be curried to fit a TargetList
with any number and types of columns.
An example illustrates the usage.
The work in this pull request to this point generally still calls the old heavyweight item-at-a-time get
methods on TupleTableSlot
under the hood, leaving the optimizations suggested above for future work, but it clears the way for those, and the old per-item get
methods in their current form should eventually be deprecated.
I had thought to continue ticking more of the other open-items boxes before doing the dispatcher, but for a change of scenery, here is the new dispatcher.
The first brand-new PL/Java-based procedural language is Glot64
. It will probably never grow to rival Python or JavaScript in popularity, either because it can't do anything but write messages to standard output, or because you write your functions/procedures in base 64 :). So, here is a Glot64
function that writes Hello, world!
on the server's standard output when called:
CREATE OR REPLACE FUNCTION javatest.hello()
RETURNS void
LANGUAGE glot64
AS 'SGVsbG8sIHdvcmxkIQo=';
The impatient may see Hello, world!
immediately, using an inline code block:
DO LANGUAGE glot64 'SGVsbG8sIHdvcmxkIQo=';
The output won't be visible at all if the server's standard output is going to /dev/null
or the like. But a test instance run in PL/Java's test harness, for example, will have its standard output going to the terminal.
In addition to the base-64-decoded source string, you will see other output from the glot64
language handler, which is really the point, for a demonstration example. The base-64 string is just for fun.
Glot64
, like any PL/Java-based language, needs a language handler: namely, a class that implements the PLJavaBasedLanguage
interface. Various methods on that interface are used for validating functions/procedures, compiling, specializing, and calling functions/procedures, and executing inline blocks (for a language that supports those).
After installing a jar containing the class that implements the language, use the name of that class to declare a validator function, using the language pljavahandler
:
CREATE OR REPLACE FUNCTION javatest.glot64_validator(oid)
RETURNS void
LANGUAGE pljavahandler
AS 'org.postgresql.pljava.example.polyglot.Glot64'; -- class name
followed by CREATE LANGUAGE
using that new function as the validator, along with PL/Java's existing routine and inline dispatcher functions as the other two handlers:
CREATE LANGUAGE glot64
HANDLER sqlj.pljavaDispatchRoutine
INLINE sqlj.pljavaDispatchInline
VALIDATOR javatest.glot64_validator;
Bear in mind that the very first still-unticked "open items" box at the top of this pull request is still:
The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor.
and that's why no PL/Java-based function or procedure can return any results yet. That will be done by storing the result value (or values) into the Call.result()
TupleTableSlot
, and the store direction doesn't work yet. So that's why Glot64
is limited to writing messages on standard output.
On the other hand, fetching from a TupleTableSlot
is indeed working already, so a language handler can fetch values from the Call.arguments()
TupleTableSlot
using whatever Adapter
is appropriate to each argument's type. The Glot64
language ignores passed arguments, but that's not a necessary limitation.
Also, of course, all the other unticked boxes in that open-items list are still unticked, so plenty of work remains. But the dispatcher is here, and the PLJavaBasedLanguage
interface, enough to begin experimenting with the development of language handlers for languages of interest.
As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.
Why needed
A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.
However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.
JDBC
One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.
The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.
Experience building an implementation of ISO SQL/XML
XMLQUERY
showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:INTERVAL
data type cannot be mapped as SQL/XML requires, because the onlyResultSetMetadata
methods JDBC defines for access to a type modifier areprecision
andscale
, which apply to numeric values; the API defines no standard way to learn what the modifier of anINTERVAL
says about whether months or days are present.DECIMAL
type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines theprecision
andscale
methods), but with their incomplete implementation in PL/Java.Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL
INTERVAL
is representable as neither ajava.time.Period
nor ajava.time.Duration
alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQLNUMERIC
value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)
Parameter and return-value mapping
PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).
While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring
VARIADIC "any"
functions.In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL
FunctionCallInfo
, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.Elements of new API
Identification of data types
A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as
JDBCType
. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC'sSQLType
interface.The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete
ResultSetMetaData
information.The new class serving this purpose is
RegType
.As
RegType
implements thejava.sql.SQLType
interface, an aliasing issue arises for a JDBC layer. Such a layer should acceptJDBCType.VARCHAR
as an alias forRegType.VARCHAR
, for example. JDBC itself has no methods that return anSQLType
instance, so the question of whether it should return the generic JDBC type or the trueRegType
does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.The details of which JDBC types are considered aliases of which
RegType
s will naturally belong in a JDBC API layer. At the level of this underlying API, aRegType
is what identifies a PostgreSQL type.While
RegType
includes convenience final fields for a number of common types, those by no means limit theRegType
s available. There is aRegType
that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.Other PostgreSQL catalog objects and key abstractions
RegType
is one among the types of PostgreSQL catalog objects modeled in theorg.postgresql.pljava.model
package.Along with a number of catalog object types, the package also contains:
TupleDescriptor
andTupleTableSlot
, the key abstractions for fetching and storing database values.TupleTableSlot
in PostgreSQL is already a useful abstraction over a few different representations; in PL/Java it is further abstracted, and can present with the same API other collections of typed, possibly named, items, such as arrays, the arguments in a function call, etc.MemoryContext
andResourceOwner
, both subtypes ofLifespan
, usable to guard Java objects that have native state whose validity is bounded in timeCharsetEncoding
Mapping PostgreSQL data types to what a PL supports
The
Adapter
classA mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning
Adapter.As<T,U>
or one of the primitive-returningAdapter.AsInt<U>
,Adapter.AsFloat<U>
, and so on (one for each Java primitive type). The Java type produced isT
for theAs
case, and implicit in the class name for theAsFoo
cases.The basic method for fetching a value from a
TupleTableSlot
isget(Attribute att, Adapter adp)
, and naturally is overloaded and generic so thatget
with anAs<T,?>
adapter returns aT
,get
with anAsInt<?>
adapter returns anint
, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (TheU
type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)A manager class for adapters
Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class
Class<T>
or some form of super type token, and returns the adapter with the expected compile-time parameterized type.That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the
org.postgresql.pljava.internal
module itself, things are simpler; no manager is needed, and code refers directly to static finalINSTANCE
fields of existing adapters.)Extensibility
PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:
java.time
package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the olderjava.sql
ones had to open issues requesting that ability and wait for a PL/Java release to include it.List
, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.Adapters implementable in pure Java
With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in
org.postgresql.pljava.adt.spi
.Leaf adapters
A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in
pljava.policy
.Composing adapters
A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type
T
(possibly null) to form an adapter returningOptional<T>
. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the
Adapter
class tracks the type relationships so that, for example, anAdapter<Optional<T>,T>
composed over anAdapter<String,Void>
is known to produceOptional<String>
.It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.
Contract-based adapters
For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An
Adapter.Contract
is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.A contract-based adapter is a leaf adapter class with a constructor that accepts a
Contract
, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in theorg.postgresql.pljava.adt
package.ArrayAdapter
The one supplied
ArrayAdapter
is contract-based. While aContract.Array
has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance ofArrayAdapter
is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, theAdapter
class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.
Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.
The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a
List
or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBCArray
object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.For now, one predefined contract is supplied,
AsFlatList
, and a static method,nullsIncludedCopy
, that can be used (via method reference) as one implementation of that contract.Java array-of-arrays
While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in
Adapter
. Therefore, if you have anAdapter
a that renders a PostgreSQL typeFoo
as Java typeBar
, you can use, for example,a.a2().build()
to obtain anAdapter
from the PostgreSQL array typeFoo[]
to the Java typeBar[][]
, requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.
The existing contract-based
ArrayAdapter
is used behind the scenes, butbuild()
takes care of generating the contract. Examples are provided.Adapter maintainability
Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)
The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.
For the adapters supplied in the
org.postgresql.pljava.internal
module, it is possible to useModelConstants.java
/ModelConstants.c
to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.Adapter
is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through aServiceLoader
API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric
comes to mind) that necessary layouts and constants do not appear in.h
files.Known open items
In no well-defined order ....
Adapter
,TupleTableSlot
, andDatum.Accessor
. These all have API and implementation for getting PostgreSQL values and presenting them in Java. Now the other direction is needed.TupleTableSlot
classes currently found as preliminary scaffolding inTupleTableSlot.java
SPITupleTable
...CatCList
...Tuplestore
? ...TupleTableSlot
stays subquadraticgrants
methods ofRegRole
and the unmplemented unary one ofCatalogObject.AccessControlled
. (Needs theCatCList
support, forpg_auth_members
searches.)NullableDatum
flavor ofTupleTableSlot
. One of the last prerequisites to enable pure-Java language-handler implementations, to which the function arguments will appear as aTupleTableSlot
.isSubtype
with the rules from Java Language Specification 4.10. (At present it is a stub that only checks erased subtyping, enough to get things initially going.)isSubtype
.)org.postgresql.pljava.adt
).TextAdapter
does not yet support the type modifiers forCHAR
andVARCHAR
. It needs a contract-based flavor that does.ArrayAdapter
(orContract.Array
) should supply at least one convenience method, taking adimsAndBounds
array parameter and generating an indexing function (aMethodHandle
?) that has nDims integer parameters and returns an integer flat index. Other related operations? An index enumerator, etc.?As<Optional<T>,T>
As<T,T>
that returns null for null and values unchangedCatalogObject
invalidation.RegClass
andRegType
are already invalidated selectively; probablyRegProcedure
should be also. PostgreSQL has a limited number of callback slots, so it would be antisocial to grab them for all the supported classes: less critical ones just depend on the global switchpoint; come up with a good story for invalidating those. Also for howTupleDescriptor
should behave upon invalidation of itsRegClass
. See commit comments for 5adf2c8.DualState
behavior ofTupleTableSlot
.VarlenaWrapper
. Goal:DatumUtils.mapVarlena
doing more in Java, less in C.VarlenaWrapper
's functionality moved toDatumImpl
Datum.Input
toVarlenaWrapper
to use it.VarlenaWrapper
; currently the behavior is hardcoded for top-transaction lifespan, lazy detoasting, appropriate forSQLXML
, which was the firstVarlenaWrapper
client.And then