pgspider / sqlite_fdw

SQLite Foreign Data Wrapper for PostgreSQL
Other
218 stars 37 forks source link

Add initial SpatiaLite ↔ PostGIS support #96

Open mkgrgis opened 8 months ago

mkgrgis commented 8 months ago

PostGIS + SpatiaLite

No need to add here PostGIS header files, because SpatiaLite has functions for EWKB data transformation.

In this PR:

mkgrgis commented 4 months ago

@t-kataym . I am ready to discussing about this PR and testing environment setup. Look like the problems is near Ubuntu libspatialite-dev. Can pgspider team help me with CI testing in this PR?

son-phamngoc commented 3 months ago

Hello @mkgrgis, thank you for your effort.

In my understanding, if we want to enable PostGIS to be able to use its data types, we need to CREATE EXTENSION PostGIS. However, in your test file (postgis.sql), you do not create it. You just create a data type which is bytea and has the same name as PostGIS.

--Testcase 01:
CREATE DOMAIN geometry AS bytea;
--Testcase 02:
CREATE DOMAIN geography AS bytea;

Is there any reason for this creation? Why didn't you enable PostGIS by creating extension PostGIS? I also do not see any step to install PostGIS in your PR.

FYI, I tried to enable PostGIS by creating extension, executed your test file and got several error. I think it does not work well with PostGIS extension.

CREATE EXTENSION postgis;
CREATE DOMAIN raster AS bytea;
--Testcase 31: ERR - raster
INSERT INTO "types_PostGIS" ( "i", gm, gg, r, t ) VALUES (1, decode('0101000020e6100000fd5aa846f9733e406c054d4bacd74d40', 'hex'),  decode('0101000020e6100000fd5aa846f9733e406c054d4bacd74d40', 'hex'),  decode('1223456890', 'hex'), '{"genus": "Rhododendron", "taxon": "Rhododendron ledebourii", "natural": "shrub", "genus:ru": "Рододендрон", "taxon:ru": "Рододендрон Ледебура", "source:taxon": "board"}');
ERROR:  Transformation between GEOS/PostGIS has failed for a value with PostgreSQL data type "public.geometry"
HINT:  In column "gm" incorrect value in 28 bytes

--Testcase 37: OK
SELECT "i", gm, gg, t FROM "types_PostGIS";
ERROR:  Unknown geometry type: 1107488 - Invalid type

--Testcase 38: OK
INSERT INTO "types_PostGIS" ( "i", gm, gg, t ) VALUES (2, decode('0101000020e6100000bf72ce99fe763e40ed4960730ed84d40', 'hex'),  decode('0101000020e6100000bf72ce99fe763e40ed4960730ed84d40', 'hex'), '{"genus": "Rhododendron", "taxon": "Rhododendron ledebourii"}');
ERROR:  Transformation between GEOS/PostGIS has failed for a value with PostgreSQL data type "public.geometry"
HINT:  In column "gm" incorrect value in 28 bytes
mkgrgis commented 3 months ago

Hello, @son-phamngoc ! Thanks for code and tests reading!

In my understanding, if we want to enable PostGIS to be able to use its data types, we need to CREATE EXTENSION PostGIS.

No. My idea of initial experimental geospatial data types support based on bytea representation of PostGIS objects. This means only this blob-affinity values should have sense in PostGIS, should be well-formed. But we don't need to test this.

I also do not see any step to install PostGIS in your PR.

There are some reasons:

  1. PostGIS based on GEOS binary or bytea formats. We shouldn't test some PostGIS functions. We can only once manually ensure created binary object is well formed by GEOS and use this result image for tests of SpatiaLite outputs. In my case DBeaver shows normal PostGIS content for a column later changed to PostGIS geometry type.

Is there any reason for this creation? Why didn't you enable PostGIS by creating extension PostGIS?

  1. PostGIS is not a part of testing environment. We can do imitation of existing PostGIS - some special data types with domain form and listed names. Our FDW based only on BLOB format and column type name. No PostGIS functions are used, no needed for tests. All tested functions belongs to SpatiaLite.
  2. There is no raster support. Any error in case of showed TC 31 is normal. If you want try this PR code with PostGIS in normal style, please comment creating of domains with special names, use automatically created internal PostGIS data types instead. For our FDW this should have the same effect.
  3. This PR doesn't implement detailed malformed data diagnostics.
son-phamngoc commented 3 months ago

@mkgrgis Thanks for your answering.

If you want try this PR code with PostGIS in normal style, please comment creating of domains with special names, use automatically created internal PostGIS data types instead. For our FDW this should have the same effect.

I did. I commented out the SQL to create domain, create PostGIS extension, and then got the error in https://github.com/pgspider/sqlite_fdw/pull/96#issuecomment-2160079630 I understood that the result of TC31 is normal, but how about TC37 and TC38? They should be OK, but they returned error instead. Could you explain more to me?

CREATE EXTENSION postgis;
-- --Testcase 01:
-- CREATE DOMAIN geometry AS bytea;
-- --Testcase 02:
-- CREATE DOMAIN geography AS bytea;
-- --Testcase 03:
-- CREATE DOMAIN addbandarg AS bytea;
-- --Testcase 04:
-- CREATE DOMAIN box2d AS bytea;
-- --Testcase 05:
-- CREATE DOMAIN box3d AS bytea;
-- --Testcase 06:
-- CREATE DOMAIN geometry_dump AS bytea;
-- --Testcase 07:
-- CREATE DOMAIN geomval AS bytea;
-- --Testcase 08:
-- CREATE DOMAIN getfaceedges_returntype AS bytea;
-- --Testcase 09:
-- CREATE DOMAIN rastbandarg AS bytea;
--Testcase 10:
CREATE DOMAIN raster AS bytea;
-- --Testcase 11:
-- CREATE DOMAIN reclassarg AS bytea;
-- --Testcase 12:
-- CREATE DOMAIN summarystats AS bytea;
-- --Testcase 13:
-- CREATE DOMAIN topoelement AS bytea;
-- --Testcase 14:
-- CREATE DOMAIN topoelementarray AS bytea;
-- --Testcase 15:
-- CREATE DOMAIN topogeometry AS bytea;
-- --Testcase 16:
-- CREATE DOMAIN unionarg AS bytea;
-- --Testcase 17:
-- CREATE DOMAIN validatetopology_returntype AS bytea;
...
--Testcase 37: OK
SELECT "i", gm, gg, t FROM "types_PostGIS";
ERROR:  Unknown geometry type: 1107488 - Invalid type
--Testcase 38: OK
INSERT INTO "types_PostGIS" ( "i", gm, gg, t ) VALUES (2, decode('0101000020e6100000bf72ce99fe763e40ed4960730ed84d40', 'hex'),  decode('0101000020e6100000bf72ce99fe763e40ed4960730ed84d40', 'hex'), '{"genus": "Rhododendron", "taxon": "Rhododendron ledebourii"}');
ERROR:  Transformation between GEOS/PostGIS has failed for a value with PostgreSQL data type "public.geometry"
HINT:  In column "gm" incorrect value in 28 bytes
mkgrgis commented 3 months ago

I understood that the result of TC31 is error, but how about TC37 and TC38? They should be OK, but they returned error instead.

Yes. What about SpatiaLite versions on your system, @son-phamngoc ? How did you get the files? I use

son-phamngoc commented 3 months ago

What about SpatiaLite versions on your system, @son-phamngoc ? How did you get the files?

@mkgrgis I used libspatialite-devel version 5.0.0-1. I installed it by command dnf install libspatialite-devel. This one, I think: https://rhel.pkgs.org/8/epel-x86_64/libspatialite-devel-5.0.0-1.el8.x86_64.rpm.html FYI, about OS, I'm using Rocky Linux 8.8.

mkgrgis commented 3 months ago

FYI, about OS, I'm using Rocky Linux 8.8.

This should be normal environment. libspatialite-devel is also normal package. I am testing on Debian 12 with such result for the newest PostgreSQL version 16.0 output.txt

Your 5.0.0-1 SpatiaLite doesn't critically differs with my 5.0.1-3. I'll test with DBeaver visualisation and update this message, but no problems expected.

mkgrgis commented 3 months ago

Look like some problem is in listed_datatype for different contexts. In my case filter of data type names works good, in your case, @son-phamngoc , maybe not.

son-phamngoc commented 3 months ago

@mkgrgis Did you try the same as me: CREATE EXTENSION PostGIS and comment out all domains with special names?

I think I found the problem.
My environment uses PostGIS 3. When printing the tuple, functions of PostGIS is called, and error occurred.
image According to your description in README, this PR only works with PostGIS 2. Is that correct? Could you tell me which functions is called when printtup with PostGIS 2? In my understanding, it should also call functions of PostGIS.

mkgrgis commented 3 months ago

@son-phamngoc , I think such PostGIS functions as geometry_in or geometry(bytea) and geography_in or geography(bytea) can be used in our case for input. geometry_out and geography_out functions are used for text output as usual. All of this functions stored in the schema of PostGIS(CREATE EXTENSION postgis SCHEMA...). Main idea of my PR is using gaiaFromSpatiaLiteBlobWkbExgaiaToSpatiaLiteBlobWkbEx2 from _/libspatialite-5.1.0/src/gaiageo/ggwkb.c in https://www.gaia-gis.it/gaia-sins/libspatialite-sources/libspatialite-5.1.0.tar.gz Why I thinks we shouldn't test PostGIS functions? Because after my PR I have normal bytea data for PostGIS. Maybe I have forgotten some SQL conversion marker or context? In case of value::bytea there were no PostGIS 3.0 problems in my manual tests.

son-phamngoc commented 3 months ago

This FDW works with PostGIS 2+ and confirmed with SpatiaLite 5.1.

@mkgrgis Could you tell me why this PR only works with PostGIS 2+? Is there any reason that it cannot work with PostGIS 3+?

mkgrgis commented 3 months ago

Could you tell me why this PR only works with PostGIS 2+? Is there any reason that it cannot work with PostGIS 3+?

@son-phamngoc, this PR should works also with PostGIS 3+. In PostGIS 2 was implemented current stable internal storage format EWKB. This storage format is used in all of PostGIS 2+ functions including all existed PostGIS 3+ versions.

Now I am trying to test context of using of the format. As data source I use https://osmcode.org/osmium-tool/ as osmium export -f pg https://www.openstreetmap.org/api/0.6/node/8906857478; this command gives me 0101000020E6100000FD5AA846F9733E406C054D4BACD74D40 {"genus":"Rhododendron","genus:ru":"Рододендрон","natural":"shrub","source:taxon":"board","taxon":"Rhododendron ledebourii","taxon:ru":"Рододендрон Ледебура"} In this listing 0101000020E6100000FD5AA846F9733E406C054D4BACD74D40 is my test data. select st_astext('0101000020E6100000FD5AA846F9733E406C054D4BACD74D40'::geometry) WKT shows POINT(30.4530224 59.6849455), no problems here. select '0101000020E6100000FD5AA846F9733E406C054D4BACD74D40'::geometry rh; is also decodable изображение I am really confused where is the problem.

mkgrgis commented 3 months ago

@son-phamngoc , some additional context

PostGIS for testing

SpatiaLite tests

For Ubuntu/Debian there is libsqlite3-mod-spatialite package, for RHEL/Rocky no additional package is needed after libspatialite, see SO question.

In my case SQLite listing with values from my PostGIS/SpatiaLite tests is also correct:

SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select load_extension('mod_spatialite');

sqlite> select ST_AsText(X'0001e6100000bf72ce99fe763e40ed4960730ed84d40bf72ce99fe763e40ed4960730ed84d407c01000000bf72ce99fe763e40ed4960730ed84d40fe');
POINT(30.464822 59.687941)

PostGIS shows some point near, POINT(30.4530224 59.6849455). You can decide about precision yourself. PostGIS select st_geomfromewkb(decode ('0101000020e6100000bf72ce99fe763e40ed4960730ed84d40', 'hex')) shows the same point as Spatialite above, POINT(30.464822 59.687941).

son-phamngoc commented 3 months ago

Why I thinks we shouldn't test PostGIS functions? Because after my PR I have normal bytea data for PostGIS. Maybe I have forgotten some SQL conversion marker or context? In case of value::bytea there were no PostGIS 3.0 problems in my manual tests.

@mkgrgis I think if you cast data to bytea, bytea_out function will be called to print data, so possibly, no problem occurs. But in my case, I do not cast data to bytea. I also did not define DOMAIN geometry AS bytea (As you can see in https://github.com/pgspider/sqlite_fdw/pull/96#issuecomment-2165178812 ). I use the original geometry data type of PostGIS, so when SELECT, the function of PostGIS is called to print. As you can see in my image in https://github.com/pgspider/sqlite_fdw/pull/96#issuecomment-2165178812, function LWGEOM_out is called.

Could you try exactly the same as my example in https://github.com/pgspider/sqlite_fdw/pull/96#issuecomment-2165178812 and give me the result in your PC (I think a screenshot or attachment of postgis.out result file is good for me)?

Possibly, your inserted data is correct, but sqlite_fdw cannot display geometry data. If so, we need to fix more.

@son-phamngoc, this PR should works also with PostGIS 3+. In PostGIS 2 was implemented current stable internal storage format EWKB. This storage format is used in all of PostGIS 2+ functions including all existed PostGIS 3+ versions.

Do you mean the following description in your README is not correct? and this PR can also work with PostGIS 3+?

Also this foreign data wrapper (FDW) can connect PostgreSQL with [PostGIS](https://www.postgis.net/) to [SpatiaLite](https://www.gaia-gis.it/fossil/libspatialite/index) SQLite database file. This FDW works with PostGIS 2+ and confirmed with SpatiaLite 5.1.
mkgrgis commented 3 months ago

@mkgrgis I think if you cast data to bytea, bytea_out function will be called to print data, so possibly, no problem occurs.

Yes. Now I have understood this is not enough. We should also test PostGIS output for the values.

But in my case, I do not cast data to bytea. <...> I use the original geometry data type of PostGIS, so when SELECT, the function of PostGIS is called to print. As you can see in my image in #96 (comment), function LWGEOM_out is called.

Could you try exactly the same as my example in #96 (comment) and give me the result in your PC (I think a screenshot or attachment of postgis.out result file is good for me)?

Yes. On my real (not testing) system with installed experimental code and PostGIS and postgis_raster the script postgis_extract.txt gives postgis_extract.out.txt after something like cat 'postgis_extract.sql' | psql -e -d "GIS";

The main point is parse error - invalid geometry. Maybe I doesn't deparse SQLite blob properly for PostGIS geometry and geography input functions.

Possibly, your inserted data is correct, but sqlite_fdw cannot display geometry data. If so, we need to fix more.

Look like you are right after our diagnostics.

Do you mean the following description in your README is not correct? and this PR can also work with PostGIS 3+?

Also this foreign data wrapper (FDW) can connect PostgreSQL with [PostGIS](https://www.postgis.net/) to [SpatiaLite](https://www.gaia-gis.it/fossil/libspatialite/index) SQLite database file. This FDW works with PostGIS 2+ and confirmed with SpatiaLite 5.1.

Yes. This is correct description because of common semantic version rules. Some examplees:

son-phamngoc commented 3 months ago

@mkgrgis Thanks for your understanding.

Yes. Now I have understood this is not enough. We should also test PostGIS output for the values.

Yes. On my real (not testing) system with installed experimental code and PostGIS and postgis_raster the script postgis_extract.txt gives postgis_extract.out.txt after something like cat 'postgis_extract.sql' | psql -e -d "GIS"; The main point is parse error - invalid geometry. Maybe I doesn't deparse SQLite blob properly for PostGIS geometry and geography input functions.

Could you investigate and fix this PR, considering above problem?

Yes. This is correct description because of common semantic version rules. Some examplees:

2.x means 2.1, 2.0.4, 2.7, 2.9.83 but not 3.0.2, but 2+ means all of listed version numbers. PostgreSQL 11+ means 12.14, 17.0, 11.4 and 16.1, but not 10.16.

I understood.

mkgrgis commented 3 months ago

Could you investigate and fix this PR, considering above problem?

Yes, @son-phamngoc . Also I need help with testing environment. How can I enable PostGIS in testing databases? Just unpack PostGIS source to contrib and compile after sudo apt-get install libproj-dev libgeos-dev libxml2-dev gettext libjson-c-dev libgdal-dev libsfcgal-dev libprotobuf-c-dev protobuf-c-compiler -y ? Will this enough?

mkgrgis commented 3 months ago

@son-phamngoc , after current fixed CI and CREATE EXTENSION postgis; in testing postgis.sql I have

ERROR:  extension "postgis" is not available
DETAIL:  Could not open extension control file "/home/runner/work/sqlite_fdw/sqlite_fdw/workdir/postgresql-16.0/tmp_install/usr/local/pgsql/share/extension/postgis.control": No such file or directory.

Have you trying to get available PostGIS on testing PostgreSQL installation?

mkgrgis commented 3 months ago

@son-phamngoc , currently the main problem in CI is invisible postgres.h for unpacked to ./contrib/postgis-VER PostGIS code. I think after this problem solving I'll can CREATE EXTENSION postgis; in sqlite_fdw test scripts. Now -I or includedir was not useful in some tested configurations. Maybe I didn't use it properly.

son-phamngoc commented 3 months ago

Have you trying to get available PostGIS on testing PostgreSQL installation?

Please add the following code to Makefile of sqlite_fdw. After that, you can execute make check with PostGIS.

check: temp-install

temp-install: EXTRA_INSTALL+=contrib/postgis

checkprep: EXTRA_INSTALL+=contrib/postgis

The reason is make check command copies the necessary source code to tmp_install folder to execute test. This test requires contrib/postgis, so you have to specify it in Makefile.

mkgrgis commented 3 months ago

Please add the following code to Makefile of sqlite_fdw. After that, you can execute make check with PostGIS.

Thanks, @son-phamngoc , but this is not enough. Please read the scripts and error. In every PostgreSQL source code directory there is configured PostGIS source code in contrib/postgis now, but something still is wrong.

mkgrgis commented 3 months ago

@son-phamngoc , current code works with both PostGIS and bytea data in PostgreSQL. Please note, SpatiaLite GIS blob doesn't accept any data without SRID. Now I am writing diagnostics and error about missing SRID in EWKB. Deparsing for = operator also is enabled.

mkgrgis commented 3 months ago

@son-phamngoc , now this PR contains universal code for both PostGIS data types and bytea domains and big test datasets. There is no problems with PostGIS compilation, but there are still big problems with make install to PostgreSQL version temporary installation. Many my commands like following was not effective

ti=$(realpath ../../tmp_install);
    BINDIR="$ti/usr/local/pgsql/bin" \
    DOCDIR="$ti/usr/local/pgsql/share/doc" \
    HTMLDIR="$ti/usr/local/pgsql/share/doc" \
    INCLUDEDIR="$ti/usr/local/pgsql/include" \
    PKGINCLUDEDIR="$ti/usr/local/pgsql/include" \
    LIBDIR="$ti/usr/local/pgsql/lib" \
    PKGLIBDIR="$ti/usr/local/pgsql/share/lib" \
    LOCALEDIR="$ti/usr/local/pgsql/share/locale" \
    MANDIR="$ti/usr/local/pgsql/share/man" \
    SHAREDIR="$ti/usr/local/pgsql/share" \
    DESTDIR="$ti" \
    make install

./configure --prefix "$ti" also look like not effective. Could you please help with this problem during CI for PostGIS IO functions testing?

son-phamngoc commented 3 months ago

@mkgrgis Thank you for your support. My work is so busy, so I could not check this immediately. Please wait for a few days. Sorry about that.

mkgrgis commented 3 months ago

Please wait for a few days

@son-phamngoc , no problems, I am also studying make install configurations and anyway construction of CI solution will not fast.

mkgrgis commented 2 months ago

@son-phamngoc , any perspectives to see PostGIS installation? I have found this GNU coding standards fragment, but not from PostGIS configure script:

# Installation directory options.
# These are left unexpanded so users can "make install exec_prefix=/foo"
# and all the variables that are supposed to be based on exec_prefix
# by default will actually change.
# Use braces instead of parens because sh, perl, etc. also accept them.
# (The list follows the same order as the GNU Coding Standards.)
bindir='${exec_prefix}/bin'
sbindir='${exec_prefix}/sbin'
libexecdir='${exec_prefix}/libexec'
datarootdir='${prefix}/share'
datadir='${datarootdir}'
sysconfdir='${prefix}/etc'
sharedstatedir='${prefix}/com'
localstatedir='${prefix}/var'
includedir='${prefix}/include'
oldincludedir='/usr/include'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
infodir='${datarootdir}/info'
htmldir='${docdir}'
dvidir='${docdir}'
pdfdir='${docdir}'
psdir='${docdir}'
libdir='${exec_prefix}/lib'
localedir='${datarootdir}/locale'
mandir='${datarootdir}/man'

Look like in case of PostGIS there is additional directories configuration for placing inside of PostgreSQL directories, but I still cannot pass to PostGIS needed directories.

son-phamngoc commented 2 months ago

@mkgrgis Sorry for late response.

I would like to share some information. For this PR, it has a new feature which requires additional libraries to be installed. Therefore, we would like to disable this feature by default. If a new compiler flag is set, this feature is enabled.

Example:

make ENABLE_GIS=1

According to that, the GitHub CI testing should be modified. Currently, for 1 version of PostgreSQL, I'm going to test the following combination.

No PostgreSQL SQLite FDW SQLite database Remark
1 default default default Normal case, require testing
2 Postgis is enabled ENABLE_GIS rtree is enabled postgis test file will pass

So, for 5 versions of PostgreSQL, there will be 10 combinations to be tested.

It is required to update the main branch of sqlite_fdw to support this GitHub CI testing. This change will be applied for any new PR which supports new compiler option (it is disabled by default) in the future. After main branch of sqlite_fdw is updated, you can execute rebase for this PR and execute test again.

Could you temporally put this PR on hold, and wait for my update on main branch first? Thank you for your support.

mkgrgis commented 2 months ago
make ENABLE_GIS=1

Thanks, @son-phamngoc ! This is compile mode I want and planned. Unfortunately I know to few to transfer this flag to .c files.

According to that, the GitHub CI testing should be modified.

I am agree, this will be usefull

So, for 5 versions of PostgreSQL, there will be 10 combinations to be tested.

I am ready to see and adopt provided PostGIS tests to all of 10 cases.

It is required to update the main branch of sqlite_fdw to support this GitHub CI testing. This change will be applied for any new PR which supports new compiler option (it is disabled by default) in the future. After main branch of sqlite_fdw is updated, you can execute rebase for this PR and execute test again.

Thanks, this will be good for me, I will ready to rebase.

Could you temporally put this PR on hold, and wait for my update on main branch first?

No problems, @son-phamngoc !

Now I have MAC address support in this branch , how do you think, @son-phamngoc, will it be reasonable for pgspider team to review possible PR from this branch before this GIS PR? There is no compiler or testing environment changes in MAC address branch.

son-phamngoc commented 2 months ago

@mkgrgis Thanks for your understanding.

Now I have MAC address support in this branch , how do you think, @son-phamngoc, will it be reasonable for pgspider team to review possible PR from this branch before this GIS PR? There is no compiler or testing environment changes in MAC address branch.

I see that you created a new PR for MAC address support. Please ask @t-kataym if you are ready and want to start reviewing.

mkgrgis commented 2 months ago

Please ask @t-kataym if you are ready and want to start reviewing.

Thanks, @son-phamngoc ! I have done. Also I have already tested branch where changes also doesn't intersect changes in this PR. How do you think, @son-phamngoc, will it reasonable to open that PR before this PR if this GIS PR have priority for me?

mkgrgis commented 1 month ago

@t-kataym , @son-phamngoc this PR with full CI testing support is ready to review, thanks to @MinhLA1410 in https://github.com/pgspider/sqlite_fdw/pull/103. @son-phamngoc, please continue discuss GIS and formal data transformations. Now this PR contains sample GIS data sets for Mars and Earth.

mkgrgis commented 1 month ago

Description of the PR was edited to actual.

t-kataym commented 2 weeks ago

@mkgrgis I have merged https://github.com/pgspider/sqlite_fdw/pull/103 which supports the use of PostGIS on CI.

mkgrgis commented 1 week ago

Thanks to pgspider team, @t-kataym ! Thanks to you and @MinhLA1410 ! This PR was updated and adopted to CI scripts. I am ready to continue review.

t-kataym commented 1 week ago

@mkgrgis Thank you for updating.
@son-phamngoc Could you review it?

mkgrgis commented 1 day ago

All of problems except to postgis test resolved or commented, @son-phamngoc . Please continue review. postgis test will be normalized some time later.