Open radist-nt opened 4 years ago
Hi,
I notice in your example code that the function is annotated Parallel.SAFE
.
Did you also encounter the segfault at any time with the default setting of Parallel.UNSAFE
?
As mentioned in the release notes, the user guide section on parallel query, and the wiki page on parallel query,
Although RESTRICTED and SAFE Java functions work in simple tests, there has been no exhaustive audit of the code to ensure that PL/Java’s internal workings never violate the behavior constraints on such functions. The support should be considered experimental, and could be a fruitful area for beta testing.
and
there may still be cases where a forbidden operation results from the internal workings of PL/Java itself. This has not been seen in testing (simple parallel queries with RESTRICTED or SAFE PL/Java functions work fine), but to rule out the possibility would require a careful audit of PL/Java's code. Until then, it would be prudent for any application involving parallel query with RESTRICTED or SAFE PL/Java functions to be first tested in a non-production environment.
I should probably make the Javadoc comments link to the user guide section to make those notes more likely to be seen.
Out of curiosity, what considerations led to marking the function Parallel.SAFE
? I think use cases seeing a performance benefit would be rare, given that every worker process participating in a parallel query with a Java function marked SAFE
would have to start its own JVM.
If the segfault is not reproducible with the default Parallel.UNSAFE
then we should probably just add the details of your situation to the notes section of the parallel-query wiki page.
Current our setup is "PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit". Last year setup was 9.6.11 or 9.6.12 (I don't remember exaclty). Afaik, PostgreSQL 9.6 uses parallel execution quite rarely (I've never seen parallel nodes in query plans). Also, 100% of sanitize_plain_text calls performed from another functions declared as parallel unsafe and 99% of them are calls from pl/pgsql code or sql with very little ammount of data.
Unfortunately, I cann't switch to pl/java code to test function with Parallel.UNSAFE, so I could not establish whether it was a Parallel.UNSAFE issue.
Hmm.
Would you be able to attach the entire hserr
file that you pasted a portion of in the first comment?
If PostgreSQL came from Red Hat packages, I can probably obtain the corresponding versions and debuginfo. It sounds as if PL/Java was locally built. Do you still have the libpljava-so-*.so
file that was in use at the time?
Here is hserr files left: hs_err.log.zip PL/Java was built locally (latest build commit was 78ef01b6e0b4b according to the local repo state), Don't sure whether PostgreSQL came from packages... I'll ask DBA about postgresql build and libpljava-so-*.so file.
Sorry, didn't find the libpljava-so-*.so file.
I am doubtful that I can do much with this. There is only one frame in the stack trace, which is PostgreSQL's own pg_detoast_datum
routine. Clearly, it was passed a null pointer. The absence of any calling stack frames leaves no practical way to determine where it was called from.
The call site might not even be within PL/Java. That doesn't mean I question that PL/Java is involved, but there might have been a null value returned at some point that is now being passed to pg_detoast_datum
from elsewhere in PostgreSQL code. Based on the information available here, without a test case that can reproduce it, I may be at a dead end.
The lack of caller stack frames probably indicates that PostgreSQL was built without the -fno-omit-frame-pointer
compiler option, so it wasn't possible to trace the caller frames back. In a PostgreSQL server built with that option, the hs_err
file might give more useful information.
Hi. About a year ago we used pl/java in production (it was version 1.5.1-BETA2, or may be even 1.5.3-SNAPSHOT), but encountered periodic database restarts due to fatal errors in pl/java related code. We have migrated our code to pl/python (working slower and requires superuser, no fatal errors anymore). But this information may be useful for project development.
The most frequently used pl/java function is test sanitizing with com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer The main part of pl/java function code is