pgspider / influxdb_fdw

InfluxDB Foreign Data Wrapper for PostgreSQL.
Other
58 stars 14 forks source link

InfluxDB v1 retention policy support #51

Open AntoniJakubiak opened 7 months ago

AntoniJakubiak commented 7 months ago

Does influxdb_fdw support retention policies?

It seems, that it reads only from the default retention policy on InfluxDB v1.

weiting1lee commented 7 months ago

Dear @AntoniJakubiak , Thank you for your question about retention policies in InfluxDB FDW. Currently, InfluxDB FDW does not support custom retention policies for InfluxDB v1 due to limitations in the cxx_client library's GetV1() function, which connects to the database without specifying a retention policy. We're open to suggestions and would appreciate any thoughts you might have on this or other features.

AntoniJakubiak commented 7 months ago

Hi,

I am sure, that the retention policies could be implemented without the change in influx cxx client library.

Important queries are:

SELECT * FROM rp.measurement;
SHOW FIELD KEYS ON db FROM rp.measurement;
SHOW TAG KEYS ON db FROM rp.measurement;

Basic implementation idea starts here:

static struct InfluxDBFdwOption valid_options[] =
{
///...  
    {"table", ForeignTableRelationId},
        {"retention_policy", ForeignTableRelationId}, // new line
};

This will allow you to define another options for foreign tables, for example:

influx=# \det+ "30d".cpu ;
                                         List of foreign tables
 Schema | Table |    Server    |                       FDW options                        | Description 
--------+-------+--------------+----------------------------------------------------------+-------------
 30d    | cpu   | influxdb_svr | ("table" 'cpu', retention_policy '30d', tags 'cpu,host') | 

Reading retention_policy name from options could look like:

influxdb_get_table_name(Relation rel)
{
// 
    foreach(lc, table->options)
    {
        DefElem    *def = (DefElem *) lfirst(lc);

        if (strcmp(def->defname, "table") == 0)
            rptn.table_name = defGetString(def);

// new lines + read retenion_policy name from options
        if (strcmp(def->defname, "retention_policy") == 0)
            rptn.retention_policy = defGetString(def);
    }

Having options, there is a need to modify measurement name. It's quite simply:

static void
influxdb_deparse_relation(StringInfo buf, Relation rel)
{
  ...  = influxdb_get_table_name(rel);
    appendStringInfo(buf, "%s.%s", retention_policy, table_name);
}

There is also a need to update import schema.

/*
 * Import a foreign schema
 */
static List *
influxdbImportForeignSchema(ImportForeignSchemaStmt *stmt,
                            Oid serverOid)
{
// ...
#ifndef CXX_CLIENT
// ...
#else
    options->svr_retention_policy = stmt->remote_schema;  // new line
    server = GetForeignServer(serverOid);
    mapping = GetUserMapping(GetUserId(), server->serverid);
    ret = InfluxDBSchemaInfo(mapping, options);
#endif

// ... 
            appendStringInfo(&buf, "\n) SERVER %s\nOPTIONS (table ",
                             quote_identifier(stmt->server_name));

            influxdb_deparse_string_literal(&buf, info[table_idx].measurement);

// 2 new lines
            appendStringInfo(&buf, ", retention_policy ");
            influxdb_deparse_string_literal(&buf, stmt->remote_schema);
}

/*
 * InfluxDBSchemaInfo
 *      Returns information of table if success
 */
extern "C" struct InfluxDBSchemaInfo_return
InfluxDBSchemaInfo(UserMapping *user, influxdb_opt *opts)
{
// ...
    char           *db = opts->svr_database;
    char           *rp = opts->svr_retention_policy; // new line

                /*
                 * Get tagkeys for each foreign table
                 */
                query = "SHOW TAG KEYS ON " + influxdb_quote_identifier(std::string(db)) + " FROM " + influxdb_quote_identifier(std::string(rp), tables_info[table_idx].measurement); // changed line
// ...
                /*
                 * Get fieldkeys for each foreign table
                 */
                query = "SHOW FIELD KEYS ON " + influxdb_quote_identifier(std::string(db)) + " FROM " + influxdb_quote_identifier(std::string(rp), tables_info[table_idx].measurement); // changed line

// ...
}

Kind regards Antoni Jakubiak

weiting1lee commented 7 months ago

@AntoniJakubiak Thank you for your suggestion regarding the FDW modification. I appreciate your effort to improve its functionality.

I noticed that the proposed modification seems tailored for the Go client, which supports only InfluxDB v1. However, this modification might not extend the retention policy support to users of the cxx-client, as GetV1 in the cxx-client does not include a retention policy parameter. Here's the link of the code for reference: link You can easily find out the differance between GetV1 and GetV2.

Given this, the modification might only be feasible for Go client users. Could we explore ways to also accommodate cxx-client users? Your input is highly valued, and I look forward to any further suggestions you might have.

AntoniJakubiak commented 7 months ago

Hi,

I need a permission from my company to contribute to open-source, so I cannot submit a patch right now. However, I will apply for it.

This modification works with InfluxDB CXX client. I have not tested it with Go client.

Kind regards Antoni Jakubiak

antonijakubiak-samsung commented 6 months ago

Hello,

I am sending a patch for #50 and #51

0001-niffler-to-pgspider.patch.gz

If you are interested, I can join Your project, create a branch and pull request.

Kind regards Antoni Jakubiak

mkgrgis commented 6 months ago

Hello, @antonijakubiak-samsung ! Please look at my PRs in https://github.com/pgspider/sqlite_fdw/commits/master/ In pgspider project usually there are only 20-25% of C/C++ code, 60-70% of tests and 10-15% of discussing about code. If you can provide tests for your changes this will be great and you'll have a 3-4 month perspective of merge. Did I scare your work?

antonijakubiak-samsung commented 6 months ago

Hi @mkgrgis.

I am not scared. Test code is obvious.

What is the contribution work-flow, e.g.: repository fork or feature branch?

Best Antoni Jakubiak

mkgrgis commented 6 months ago

What is the contribution work-flow, e.g.: repository fork or feature branch?

Fork of repository -> creating a branch for new feature -> one full initial commit (code changes + documentation changes + tests) -> PR for pgspider team -> review ... other commits by review ...

Testing scripts for getting multi-versional PostgreSQL C environment is here.

antonijakubiak-samsung commented 6 months ago

Hi @mkgrgis

Sadly, compiling and testing your project is complicated for me. I think, it could depend on my system. Do you have more detailed instruction for compiling and testing?

Instead of writing the instruction we could use Dockerfile.

For example, here is a Dockerfile which I am using to compile InfluxDB FDW.

FROM postgres:15.6-bookworm
RUN apt-get update
RUN apt-get install -y postgresql-server-dev-15
RUN apt-get install -y bash vim git
RUN apt-get install -y build-essential cmake
RUN apt-get install -y libcurl4-openssl-dev 
ARG INSTALL_DIR=/src
RUN mkdir $INSTALL_DIR
RUN \
       cd $INSTALL_DIR; \
       git clone https://github.com/pgspider/influxdb-cxx.git; \
       cd influxdb-cxx; \
       mkdir build; \
       cd build; \
        cmake .. -DINFLUXCXX_WITH_BOOST=OFF -DINFLUXCXX_TESTING=OFF; \
       make; \
       make install
RUN \
       cd $INSTALL_DIR; \
       git clone https://github.com/pgspider/influxdb_fdw.git; \
       cd influxdb_fdw; \
       make USE_PGXS=1 with_llvm=no CXX_CLIENT=1; \
       make install USE_PGXS=1 with_llvm=no CXX_CLIENT=1
RUN ldconfig

Then:

docker build -t influxdb_fdw .
docker run --name influxdb_fdw -e POSTGRES_PASSWORD=secret --publish 5431:5432 -ti influxdb_fdw
psql -h localhost -p 5431 -U postgres -c 'create extension influxdb_fdw'

Could you teach me how to compile and test influxdb_fdw, or could you provide a Dockerfile?

Kind regards Antoni Jakubiak

mkgrgis commented 6 months ago

Do you have more detailed instruction for compiling and testing?

Yes. There is special common testing environment scripts https://github.com/pgspider/fdw_testing_scripts for all of FDWs of pgspider project. This scripts allows to get multiversional PostgreSQL source code environment. Preconditions was described at this lines of a script. Hence you should read https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code and get InfluxDB environment for tests. I think you doesn't need docker for compiling or for testing environment, but docker is used for InfluxDB testing installation themself, see https://github.com/pgspider/influxdb_fdw/blob/master/init.sh

Also you can add GitHub auto testing by copying https://github.com/pgspider/sqlite_fdw/tree/master/GitHubActions and adopting this scripts to InfluxDB instead of SQLite.

Could you teach me how to compile and test influxdb_fdw, or could you provide a Dockerfile?

I hope this will enough for you, @antonijakubiak-samsung Any questions will be appreciate.

antonijakubiak-samsung commented 6 months ago

Hi,

I am trying to run the system tests.

I have the following results:

# +++ regress check in contrib/influxdb_fdw +++
# using temp instance on port 61696 with PID 12670
ok 1         - 16.0/aggregate                            105 ms
ok 2         - 16.0/influxdb_fdw                        1791 ms
not ok 3     - 16.0/selectfunc                           853 ms
ok 4         - 16.0/extra/join                       1220769 ms
ok 5         - 16.0/extra/limit                         1180 ms
ok 6         - 16.0/extra/aggregates                 1234906 ms
ok 7         - 16.0/extra/insert                        5318 ms
ok 8         - 16.0/extra/prepare                        305 ms
not ok 9     - 16.0/extra/select_having                50383 ms
ok 10        - 16.0/extra/select                         694 ms
ok 11        - 16.0/extra/influxdb_fdw_post           123801 ms
ok 12        - 16.0/schemaless/aggregate                 109 ms
ok 13        - 16.0/schemaless/influxdb_fdw             1151 ms
not ok 14    - 16.0/schemaless/selectfunc                942 ms
ok 15        - 16.0/schemaless/schemaless                110 ms
ok 16        - 16.0/schemaless/extra/join            1151438 ms
ok 17        - 16.0/schemaless/extra/limit              2065 ms
ok 18        - 16.0/schemaless/extra/aggregates      1287630 ms
ok 19        - 16.0/schemaless/extra/prepare             309 ms
not ok 20    - 16.0/schemaless/extra/select_having     40954 ms
ok 21        - 16.0/schemaless/extra/insert            12898 ms
ok 22        - 16.0/schemaless/extra/select              707 ms
ok 23        - 16.0/schemaless/extra/influxdb_fdw_post   168791 ms
not ok 24    - 16.0/schemaless/add_fields                719 ms
not ok 25    - 16.0/schemaless/add_tags                  925 ms
not ok 26    - 16.0/schemaless/add_multi_key            1166 ms

16_0_regr.diff.gz

It seems, that the test results depends on the environment settings. For example regr.diff: PostgreSQL sort order depends on the locale:

diff -U3 /workspace/PostgreSQL source/REL_16_0/postgresql/contrib/influxdb_fdw/expected/16.0/extra/select_having.out /workspace/PostgreSQL source/REL_16_0/postgresql/contrib/influxdb_fdw/results/16.0/extra/select_having.out
--- "/workspace/PostgreSQL source/REL_16_0/postgresql/contrib/influxdb_fdw/expected/16.0/extra/select_having.out"       2024-05-08 09:24:28.204016490 +0000
+++ "/workspace/PostgreSQL source/REL_16_0/postgresql/contrib/influxdb_fdw/results/16.0/extra/select_having.out"        2024-05-08 10:07:03.882337549 +0000
@@ -46,8 +46,8 @@
        GROUP BY b, c HAVING b = 3 ORDER BY b, c;
  b |    c
 ---+----------
- 3 | bbbb    
  3 | BBBB
+ 3 | bbbb  

The sort order depends on the locale settings:

root@87f204cadebb:/workspace/PostgreSQL source# echo -e "bbbb\nBBBB" | LC_ALL=C sort
BBBB
bbbb
root@87f204cadebb:/workspace/PostgreSQL source# echo -e "bbbb\nBBBB" | LC_ALL=en_US.utf8 sort
bbbb
BBBB

It means, that I should run the get_environment.sh script with some environment settings. Could you help me, to create such environment?

  1. how about execution time, is it normal on a modern PC?
  2. what is the default locale?
  3. which golang version you are using for influxdb_fdw compilation?
  4. on which InfluxDB you are testing?
  5. do you have any special InfluxDB configuration.

In order to reduce environment dependencies, we could create a containerized build environment. In such containerized environment, we could get a clean Linux distribution, get dependencies, build and test the software in 100% repeatable form. I'm sending example docker files which could be executed with docker compose up --build my-docker.tar.gz.

Kind regards Antoni Jakubiak

mkgrgis commented 6 months ago

Hello, @antonijakubiak-samsung !

  • how about execution time, is it normal on a modern PC?

Tests for most of pgspider FDWs is not optimized for fast executing. Testing time for sqlite_fdw was longer than 20 min before moving SQLite DBs in /tmp mounted as tmpfs. There is 1..1.5 min for all versions after this change. Unfortunately I have no InfluxDB or influxdb_fdw experience...

  • what is the default locale?

Look like this is C locale as for other PostgreSQL-like tests. Please note, most of pgspider FDWs yet not support PostgreSQL text encodings. Look like, text sorting and transforming additionally depends on system locale. The support for encodings in sqlite_fdw was added in https://github.com/pgspider/sqlite_fdw/commit/90dad0a3ae7b8ffd129149088e094276c439ed61 , but the tests was detached to separate file later. I think you can copy 70-80% of the test and adopt other for InfluxDB FDW context. About SQL_ASCII + en_US in your diff I doesn't understand a conceptual source of problem. For SQLite FDW no en_US locale/collation is recommended or necessary, see https://github.com/pgspider/sqlite_fdw/blob/master/GitHubActions/install_locales.sh Maybe there is not normal FDW deparsing logic with InfluxDB?

  • which golang version you are using for influxdb_fdw compilation?

This is a question to pgspider team. @t-kataym _, could you please ask some of authors of influxdbfdw about this?

  • on which InfluxDB you are testing?

There is 3 supported InfluxDB versions, but README doesn't describe details of the testing environment

  • do you have any special InfluxDB configuration.

This is also a question to pgspider team. I know nothing more about than docker file content.

mkgrgis commented 6 months ago

@weiting1lee , maybe you can help about previous comment ?

weiting1lee commented 6 months ago

Thank you, @mkgrgis

@AntoniJakubiak, I greatly appreciate your efforts in testing and your suggestion regarding the testing environment. Using a containerized environment is a great idea, and we are also considering employing GitHub Actions for testing, similar to what sqlite_fdw uses, to facilitate easier testing for all developers.

Which golang version are you using for influxdb_fdw compilation?

For support up to PostgreSQL 16.0, we are using Golang 1.20.4.

On which InfluxDB are you testing?

In our environment, we use InfluxDB v1: 1.8.10 and InfluxDB v2: 2.2.0.

Do you have any special InfluxDB configuration?

There are no special configurations in our setup. If you continue to encounter difficulties with this setup, please do share more details so we can assist you better. Looking forward to your contributions and further insights!