spring-attic / jdbc

Apache License 2.0
11 stars 18 forks source link

Improve incremental update support for JDBC source #2

Open sabbyanandan opened 7 years ago

sabbyanandan commented 7 years ago

From @sabbyanandan on May 6, 2016 17:13

From @ctatineni on February 8, 2016 19:41

Hello,

just want to check if the current JDBC module supports Incremental data updates, for example, if i am reading the data from MySQL table , i want to make sure i only read the data that is added from the last read, either using offset columns or any other approach, also does it support if i am reading from multiple tables? does the same support available for NoSQL data from Mongo or Cassandra?

Copied from original issue: spring-cloud/spring-cloud-stream-modules#181

Copied from original issue: spring-cloud/spring-cloud-stream-app-starters#76

sabbyanandan commented 7 years ago

From @artembilan on February 8, 2016 22:27

Hello, @ctatineni !

Yes it does. The JDBC Source has options like:

We don't have such a CassandraMessageSource implementation yet. (Feel free to fill out an appropriate JIRA ticket.) Therefore we don't have concrete plan about Cassandra Source.

Yes, we have MongoDbMessageSource, but I see it doesn't have findAndModify ability... Feel free to raise an appropriate JIRA as well.

sabbyanandan commented 7 years ago

From @ctatineni on February 9, 2016 19:48

Thanks, just to be clear,this is my understanding, if i use the existing JDBC module and provide a query, it will incrementally read the data from database, for example if it is trying to read from database every 5 seconds, if it reads 1 to 500 records during first read and next time it reads from 500. Let me know if my assumption is correct.

sabbyanandan commented 7 years ago

@ctatineni: See below few stream definitions from Spring XD, but in essence these would apply for Spring Cloud Stream or Spring Cloud Data Flow model, too.

case 1: read from a table continuously (once every 5s)

stream create foo --definition "jdbc --fixedDelay=5 --split=1 --url=jdbc:hsqldb:hsql://localhost:9101/mydb --query='select * from testfoo' | log" --deploy

case 2: incremental reads (once every 5s)

stream create foo --definition "jdbc --fixedDelay=5 --split=1 --url=jdbc:hsqldb:hsql://localhost:9101/mydb --query='select * from testfoo where tag = 0' --update='update testfoo set tag=1 where fooid in (:fooid)'|log" --deploy
sabbyanandan commented 7 years ago

From @artembilan on February 9, 2016 20:15

No, it isn't so smart. :smile: Although it seems for me I got it. You would like to do some paging without update:

SELECT * FROM
(
    SELECT a.*, rownum r__
    FROM
    (
        SELECT * FROM my_table WHERE 
    ) a
    WHERE rownum <= :pageNumber * :pageSize)
)
WHERE r__ > ((:pageNumber-1) * :pageSize)

where those :pageNumber and :pageSize query parameters should be updated after each poll.

I can confirm that it is possible with the setSelectSqlParameterSource(SqlParameterSource sqlQueryParameterSource) for the JdbcPollingChannelAdapter, but it is a feature request for the SCS JDBC Source module. And we can consider to implement it here though.

sabbyanandan commented 7 years ago

From @ctatineni on February 11, 2016 15:32

Thanks for the clarification, i will use this approach for now, is there any plan to add this feature to existing SCS JDBC source?

sabbyanandan commented 7 years ago

From @artembilan on February 11, 2016 16:9

I think we will have them eventually.

It's a bit tricky to provide the paging SELECT wrappers for different vendors (that one above is for Oracle), but I think we can come up with something, since Hibernate can do that :smile:

CC @trisberg