Closed bylexus closed 1 year ago
Hello @bylexus
Thanks for reporting this issue. Does the execution terminate successfully if you remove -x=/path/to/lob/folder
from your command? If yes, can you successfully execute the command after re-adding -x=/path/to/lob/folder
to it and making sure that -s=/path/to/output.siard
includes a relative reference to the parent directory of the output SIARD file (for instance, by replacing -s=./output.siard
for -s=output.siard
in the command)?
Hi @walkerjan,
Unfortunately, no:
For me it seems more like a general non-known data type used? In our test database, it stops working after that line:
...
Routine: "public"."gen_random_bytes"
Routine: "public"."gen_salt"
Routine: "public"."gen_salt"
Routine: "public"."gin_extract_query_trgm"
java.lang.NullPointerException:
The gin_extract_query_trgm is a Postgresql function from the Trigram extension (pg_trgm), maybe this could cause the exception?
I also tried to increase the log (in etc/logging.properties), with no luck (not more output, even if I set all the log levels to "ALL").
What kind of data would help you investigate this error further?
Hello @bylexus
Thanks for providing more information on the issue. Yes, I agree that the log is not particularly informative in your case. However, it seems to me that you have correctly located the problem on your own: As far as I can see, SIARD Suite currently doesn't support data types that stem from PostgreSQL extensions/modules. Can you avoid the exception by removing all extension/module-specific data types?
Hi @walkerjan,
Thanks for your answer. Indeed if I skip / drop the postgresql specific extensions (in our case: pg_trgm
, unaccent
), the export works as expected.
This would cause a bit of additional work in automating the siard archive generation, but is a feasible work-around for us.
thanks for your help, regards alex
OK, my euphorism was too early :-) next problem arrives with own-defined data types (like enums): also here, we get a NullPointerException.
We extensively use own-defined types in the DB like for example like this:
CREATE TYPE public.brief_vorlage_typ AS ENUM ('XML', 'Word', 'Email');
... and use it in regular table definitions:
CREATE TABLE IF NOT EXISTS public.brief_vorlage
(
-- .....
brief_vorlage_typ brief_vorlage_typ,
-- ......
)
It seems that SIARD cannot handle such types?
That makes it very hard / impossible to dump a DB with own-defined types. Or do you have any hints here?
Hello @bylexus
SIARD Suite should be capable of handling such custom-defined types. (Using the GUI, I just verified this by successfully creating a SIARD archive from a PostgreSQL database containing a table with a column of your custom-defined type brief_vorlage_typ
.) Do some of your custom type definitions still rely on extension-/module-specific features?
No, I removed all custom extensions. It seems that something else is bothering the tool. I try to figure out what it may be, and let you know. Perhaps I can also provide a table dump, but I have to prepare that first.
It seems that in my particular case, my example table issues some kind of troubles, but only in the data extraction phase: The table analyzation (schema gathering) seems to be ok, but as soon as the actual data export is startet, SIARD crashes.
I created a sample database with a single table in it (brief_vorlage
), including its data. I attach an SQL dump (PostgreSQL 12) for further analysis.
You can restore the database with the following postgres commands:
CREATE DATABASE siardtest
WITH
OWNER = yourdbuser
ENCODING = 'UTF8'
LC_COLLATE = 'de_CH.UTF-8'
LC_CTYPE = 'de_CH.UTF-8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False;
psql -1 -U yourdbuser -h yourdbhost -d siardtest -f siardtest.sql
java -cp lib/siardcmd.jar ch.admin.bar.siard2.cmd.SiardFromDb \
-o \
-j=jdbc:postgresql://yourdbhost:5432/siardtest \
-u=yourdbuser -p=xxxxxpasswordxxxxx \
-s=siardtest.siard
It crashes then after starting the export:
...
Connected to jdbc:postgresql://127.0.0.1:5432/siardtest
Meta Data
Table: "public"."brief_vorlage"
Type: "public"."anrede_form_typ"
Type: "public"."brief_vorlage_typ"
Type: "public"."ausgabe_typ"
Primary Data
Table: "public"."brief_vorlage"
java.lang.NullPointerException:
Maybe you can investigate into that matter? I would really appreciate it. If you need any more information, please let me know.
Hello @bylexus
Thanks for providing this well-documented testcase. As far as I can see, the problem is with column brief_beilagen_ids
of array type integer[]
. Please let me know whether your execution terminates successfully after deleting this column. If so, we will have to further investigate why SIARD Suite crashes by handling columns of array type integer[]
.
Hello @walkerjan
Indeed, removing the integer[]
column resolves the problem.
As I remember during some previous tests, SIARD is able to extract array value types. So I investigated a bit further, and found the following:
SIARD cannot handle NULL values within an array column:
create table testtable
(
id serial primary key not null,
numbers integer[]
);
-- Works:
INSERT INTO testtable (numbers)
VALUES
('{1,2,3}'),
('{4}'),
;
-- Inserting a NULL value (which is absolutely fine and correct for postgresql) causes SIARD to crash:
INSERT INTO testtable (numbers)
VALUES
(NULL)
;
So maybe this could be fixed in SIARD, as NULL values on non-array-types are working fine? Or do we have to clean our data to convert null array values to empty array values first?
thanks alex
Hello @bylexus
Thanks a lot for figuring this out. I have provided a fix for this so that NULL-values will be allowed in columns of array type. Unfortunately, you have to convert NULL-values to empty arrays until the fix will be released.
Hi @walkerjan Great, thank you, I can live with that. We are in the early beginning of organizing archival records, so this surely has time to fix.
A fix to this issue will be included in the soon to be released next version of the siard suite application. Will be fixed with https://github.com/sfa-siard/JdbcPostgres/pull/4
Hello @mburri
Thanks for fixing this issue. It may be that this is a problem in the GUI version, too: We created a SIARD V2 archive with another tool, containing (empty) integer array columns (null values). When opening this archive in the SIARD GUI tool (File -> Open), I get the following message:
java.io.ioException: Invalid SIARD meta data!
Console log:
SCHWERWIEGEND: XML validation for null failed at line 9621 and column 49 with message cvc-pattern-valid: Wert '<UNSUPPORTED DATATYPE>' ist nicht Facet-gxFCltig in Bezug auf Muster 'INTEGER|INT|SMALLINT|BIGINT|(NUMERIC|DECIMAL|DEC)(\s*\(\s*[1-9]\d*\s*(,\s*\d+\s*)?\))?|REAL|DOUBLE PRECISION|FLOAT(\s*\(\s*[1-9]\d*\s*\))?|(CHARACTER|CHAR)(\s*\(\s*[1-9]\d*\s*\))?|(CHARACTER\s+VARYING|CHAR\s+VARYING|VARCHAR)(\s*\(\s*[1-9]\d*\s*\))?|(CHARACTER\s+LARGE\s+OBJECT|CLOB)(\s*\(\s*[1-9]\d*(\s*(K|M|G))?\s*\))?|(NATIONAL\s+CHARACTER|NATIONAL\s+CHAR|NCHAR)(\s*\(\s*[1-9]\d*\s*\))?|(NATIONAL\s+CHARACTER\s+VARYING|NATIONAL\s+CHAR\s+VARYING|NCHAR VARYING)(\s*\(\s*[1-9]\d*\s*\))?|(NATIONAL\s+CHARACTER\s+LARGE\s+OBJECT|NCHAR\s+LARGE\s+OBJECT|NCLOB)(\s*\(\s*[1-9]\d*(\s*(K|M|G))?\s*\))?|XML|BINARY(\s*\(\s*[1-9]\d*\s*\))?|(BINARY\s+VARYING|VARBINARY)(\s*\(\s*[1-9]\d*\s*\))?|(BINARY\s+LARGE\s+OBJECT|BLOB)(\s*\(\s*[1-9]\d*(\s*(K|M|G))?\s*\))?|DATE|(TIME|TIME\s+WITH\s+TIME\s+ZONE)(\s*\(\s*[1-9]\d*\s*\))?|(TIMESTAMP|TIMESTAMP\s+WITH\s+TIME\s+ZONE)(\s*\(\s*(0|([1-9]\d*))\s*\))?|INTERVAL\s+(((YEAR|MONTH|DAY|HOUR|MINUTE)(\s*\(\s*[1-9]\d*\s*\))?(\s+TO\s+(MONTH|DAY|HOUR|MINUTE|SECOND)(\s*\(\s*[1-9]\d*\s*\))?)?)|(SECOND(\s*\(\s*[1-9]\d*\s*(,\s*\d+\s*)?\))?))|BOOLEAN' fxFCr Typ 'predefinedTypeType'.
... more content omitted ...
Maybe this is caused by the same root cause?
Hi,
We try to create a SIARD from an existing PostgreSQL database using SiardFromDb, which ends in a Java Nullpointer Exception. Here is what we tried:
System:
Command ran
We tried the following command:
The command starts, and after some output, it stops with a NullPointerException:
... and that's about it.
We also tried other databases, with the same effect.
Any hints?
Thanks alex