singer-io / tap-kustomer

GNU Affero General Public License v3.0
2 stars 13 forks source link

Pagination Issue? Conversations data eternally extracted. [Stitch ETL] #12

Open dRuEFFECT opened 3 years ago

dRuEFFECT commented 3 years ago

I'm using Stitch for ETL and we recently onboarded with Kustomer. The first week or so we were getting data through Stitch with this Kustomer integration just fine, but then it seemed that the extraction of conversation data was stuck at a specific timestamp (2020-10-10T18:02:27.976Z). I set up a new Kustomer integration in Stitch with a new API key, and on the very first extraction attempt we see the logs collect conversation data up to the same bookmarked timestamp and get stuck there. By the pagination in the logs, it seems like the extraction is paginating extraction past the number of records? I'm a data manager, not a software developer, so this is all a bit over my head and not sure if I'm explaining things accurately.

Extraction just keeps going until the connection dies, and continues extracting from where it was last paginated.

Attaching my full extraction log for the first run. 134979.295817.sync.zip

Maybe I'm misunderstanding things, but it seems to me like the extraction of page 745 should be the last one for that timestamp, but it just keeps on going, literally forever.

2020-11-17 21:55:17,473Z tap - INFO Write state for stream: conversations, value: 2020-10-10T18:02:27.976Z 2020-11-17 21:55:17,473Z tap - INFO Synced Stream: conversations, page: 744, 74300 to 74400 of total records: 74493 2020-11-17 21:55:17,473Z tap - INFO URL for Stream conversations: https://api.kustomerapp.com/v1/customers/search 2020-11-17 21:55:17,473Z tap - INFO URL for Stream conversations: customers/search?pageSize=100 2020-11-17 21:55:17,558Z tap - INFO HTTP request to "conversations" endpoint took 0.070s, returned status code 200 2020-11-17 21:55:17,856Z tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [0] at offset None 2020-11-17 21:55:17,856Z tap - INFO replicated 100 records from "conversations" endpoint 2020-11-17 21:55:17,856Z tap - INFO Stream conversations, batch processed 100 records 2020-11-17 21:55:17,857Z tap - INFO Write state for stream: conversations, value: 2020-10-10T18:02:27.976Z 2020-11-17 21:55:17,857Z tap - INFO Synced Stream: conversations, page: 745, 74400 to 74500 of total records: 74493 2020-11-17 21:55:17,857Z tap - INFO URL for Stream conversations: https://api.kustomerapp.com/v1/customers/search 2020-11-17 21:55:17,857Z tap - INFO URL for Stream conversations: customers/search?pageSize=100 2020-11-17 21:55:17,985Z tap - INFO HTTP request to "conversations" endpoint took 0.090s, returned status code 200 2020-11-17 21:55:18,267Z tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [3] at offset None 2020-11-17 21:55:18,267Z tap - INFO replicated 100 records from "conversations" endpoint 2020-11-17 21:55:18,268Z tap - INFO Stream conversations, batch processed 100 records 2020-11-17 21:55:18,268Z tap - INFO Write state for stream: conversations, value: 2020-10-10T18:02:27.976Z 2020-11-17 21:55:18,268Z tap - INFO Synced Stream: conversations, page: 746, 74500 to 74600 of total records: 74493 2020-11-17 21:55:18,268Z tap - INFO URL for Stream conversations: https://api.kustomerapp.com/v1/customers/search 2020-11-17 21:55:18,268Z tap - INFO URL for Stream conversations: customers/search?pageSize=100 2020-11-17 21:55:18,347Z tap - INFO HTTP request to "conversations" endpoint took 0.059s, returned status code 200 2020-11-17 21:55:18,638Z tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [0] at offset None 2020-11-17 21:55:18,639Z tap - INFO replicated 100 records from "conversations" endpoint 2020-11-17 21:55:18,639Z tap - INFO Stream conversations, batch processed 100 records

at the time of this writing in the current extraction, we're at page 83277, and oddly the number of total records is higher by a few thousand... I don't understand what's happening.

2020-11-19 21:01:44,694Z tap - INFO Stream conversations, batch processed 100 records 2020-11-19 21:01:44,694Z tap - INFO Write state for stream: conversations, value: 2020-10-10T18:02:27.976Z 2020-11-19 21:01:44,694Z tap - INFO Synced Stream: conversations, page: 83277, 8327600 to 8327700 of total records: 78098 2020-11-19 21:01:44,694Z tap - INFO URL for Stream conversations: https://api.kustomerapp.com/v1/customers/search 2020-11-19 21:01:44,694Z tap - INFO URL for Stream conversations: customers/search?pageSize=100

dRuEFFECT commented 3 years ago

I tried setting up yet another integration for Kustomer, this time putting a start date AFTER this supposed 'stuck' timestamp, and the Messages dataset is suffering from the same thing. It's hard to believe we're the only ones having this issue.

2020-11-30 23:49:50,191Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:50,191Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:50,191Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:50,191Z    tap - INFO Synced Stream: messages, page: 75589, 7558800 to 7558900 of total records: 42224
2020-11-30 23:49:50,191Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:50,191Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:50,246Z    tap - INFO HTTP request to "messages" endpoint took 0.043s, returned status code 200
2020-11-30 23:49:50,386Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:50,386Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:50,386Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:50,386Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:50,386Z    tap - INFO Synced Stream: messages, page: 75590, 7558900 to 7559000 of total records: 42224
2020-11-30 23:49:50,386Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:50,386Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:50,446Z    tap - INFO HTTP request to "messages" endpoint took 0.039s, returned status code 200
2020-11-30 23:49:50,587Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [5] at offset None
2020-11-30 23:49:50,587Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:50,588Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:50,588Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:50,588Z    tap - INFO Synced Stream: messages, page: 75591, 7559000 to 7559100 of total records: 42224
2020-11-30 23:49:50,588Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:50,588Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:50,642Z    tap - INFO HTTP request to "messages" endpoint took 0.042s, returned status code 200
2020-11-30 23:49:50,773Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [4] at offset None
2020-11-30 23:49:50,774Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:50,774Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:50,774Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:50,774Z    tap - INFO Synced Stream: messages, page: 75592, 7559100 to 7559200 of total records: 42224
2020-11-30 23:49:50,774Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:50,774Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:50,845Z    tap - INFO HTTP request to "messages" endpoint took 0.056s, returned status code 200
2020-11-30 23:49:51,015Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:51,015Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:51,015Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:51,015Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:51,015Z    tap - INFO Synced Stream: messages, page: 75593, 7559200 to 7559300 of total records: 42224
2020-11-30 23:49:51,015Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:51,015Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:51,413Z    tap - INFO HTTP request to "messages" endpoint took 0.039s, returned status code 200
2020-11-30 23:49:51,589Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [6] at offset None
2020-11-30 23:49:51,590Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:51,590Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:51,590Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:51,590Z    tap - INFO Synced Stream: messages, page: 75594, 7559300 to 7559400 of total records: 42224
2020-11-30 23:49:51,590Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:51,590Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:51,673Z    tap - INFO HTTP request to "messages" endpoint took 0.064s, returned status code 200
2020-11-30 23:49:51,798Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [6] at offset None
2020-11-30 23:49:51,798Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:51,798Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:51,798Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:51,798Z    tap - INFO Synced Stream: messages, page: 75595, 7559400 to 7559500 of total records: 42224
2020-11-30 23:49:51,798Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:51,798Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:51,859Z    tap - INFO HTTP request to "messages" endpoint took 0.047s, returned status code 200
2020-11-30 23:49:52,018Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [8] at offset None
2020-11-30 23:49:52,018Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:52,018Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:52,018Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:52,018Z    tap - INFO Synced Stream: messages, page: 75596, 7559500 to 7559600 of total records: 42224
2020-11-30 23:49:52,018Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:52,018Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:52,142Z    tap - INFO HTTP request to "messages" endpoint took 0.109s, returned status code 200
2020-11-30 23:49:52,266Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [8] at offset None
2020-11-30 23:49:52,266Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:52,266Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:52,266Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:52,266Z    tap - INFO Synced Stream: messages, page: 75597, 7559600 to 7559700 of total records: 42224
2020-11-30 23:49:52,266Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:52,266Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:52,321Z    tap - INFO HTTP request to "messages" endpoint took 0.040s, returned status code 200
2020-11-30 23:49:52,439Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [1] at offset None
2020-11-30 23:49:52,439Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:52,439Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:52,439Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:52,440Z    tap - INFO Synced Stream: messages, page: 75598, 7559700 to 7559800 of total records: 42224
2020-11-30 23:49:52,440Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:52,440Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:52,498Z    tap - INFO HTTP request to "messages" endpoint took 0.040s, returned status code 200
2020-11-30 23:49:52,603Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:52,603Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:52,603Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:52,603Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:52,603Z    tap - INFO Synced Stream: messages, page: 75599, 7559800 to 7559900 of total records: 42224
2020-11-30 23:49:52,603Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:52,603Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:52,674Z    tap - INFO HTTP request to "messages" endpoint took 0.047s, returned status code 200
2020-11-30 23:49:52,856Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:52,856Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:52,856Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:52,857Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:52,857Z    tap - INFO Synced Stream: messages, page: 75600, 7559900 to 7560000 of total records: 42224
2020-11-30 23:49:52,857Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:52,857Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:52,925Z    tap - INFO HTTP request to "messages" endpoint took 0.047s, returned status code 200
2020-11-30 23:49:53,027Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [3] at offset None
2020-11-30 23:49:53,028Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,028Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,028Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,028Z    tap - INFO Synced Stream: messages, page: 75601, 7560000 to 7560100 of total records: 42224
2020-11-30 23:49:53,028Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,028Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:53,102Z    tap - INFO HTTP request to "messages" endpoint took 0.062s, returned status code 200
2020-11-30 23:49:53,235Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [5] at offset None
2020-11-30 23:49:53,235Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,235Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,235Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,235Z    tap - INFO Synced Stream: messages, page: 75602, 7560100 to 7560200 of total records: 42224
2020-11-30 23:49:53,235Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,236Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:53,299Z    tap - INFO HTTP request to "messages" endpoint took 0.041s, returned status code 200
2020-11-30 23:49:53,422Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [5] at offset None
2020-11-30 23:49:53,422Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,422Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,422Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,422Z    tap - INFO Synced Stream: messages, page: 75603, 7560200 to 7560300 of total records: 42224
2020-11-30 23:49:53,422Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,422Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:53,480Z    tap - INFO HTTP request to "messages" endpoint took 0.037s, returned status code 200
2020-11-30 23:49:53,591Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [6] at offset None
2020-11-30 23:49:53,591Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,592Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,592Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,592Z    tap - INFO Synced Stream: messages, page: 75604, 7560300 to 7560400 of total records: 42224
2020-11-30 23:49:53,592Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,592Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:53,660Z    tap - INFO HTTP request to "messages" endpoint took 0.051s, returned status code 200
2020-11-30 23:49:53,770Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:53,770Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,771Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,771Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,771Z    tap - INFO Synced Stream: messages, page: 75605, 7560400 to 7560500 of total records: 42224
2020-11-30 23:49:53,771Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,771Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:53,836Z    tap - INFO HTTP request to "messages" endpoint took 0.056s, returned status code 200
2020-11-30 23:49:53,961Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:53,961Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:53,961Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:53,962Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:53,962Z    tap - INFO Synced Stream: messages, page: 75606, 7560500 to 7560600 of total records: 42224
2020-11-30 23:49:53,962Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:53,962Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,026Z    tap - INFO HTTP request to "messages" endpoint took 0.054s, returned status code 200
2020-11-30 23:49:54,136Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [2] at offset None
2020-11-30 23:49:54,136Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:54,136Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:54,136Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:54,136Z    tap - INFO Synced Stream: messages, page: 75607, 7560600 to 7560700 of total records: 42224
2020-11-30 23:49:54,136Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:54,136Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,215Z    tap - INFO HTTP request to "messages" endpoint took 0.051s, returned status code 200
2020-11-30 23:49:54,319Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [3] at offset None
2020-11-30 23:49:54,320Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:54,320Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:54,320Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:54,320Z    tap - INFO Synced Stream: messages, page: 75608, 7560700 to 7560800 of total records: 42224
2020-11-30 23:49:54,320Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:54,320Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,373Z    tap - INFO HTTP request to "messages" endpoint took 0.041s, returned status code 200
2020-11-30 23:49:54,476Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [8] at offset None
2020-11-30 23:49:54,476Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:54,476Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:54,476Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:54,476Z    tap - INFO Synced Stream: messages, page: 75609, 7560800 to 7560900 of total records: 42224
2020-11-30 23:49:54,476Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:54,476Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,538Z    tap - INFO HTTP request to "messages" endpoint took 0.047s, returned status code 200
2020-11-30 23:49:54,720Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [8] at offset None
2020-11-30 23:49:54,720Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:54,720Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:54,720Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:54,720Z    tap - INFO Synced Stream: messages, page: 75610, 7560900 to 7561000 of total records: 42224
2020-11-30 23:49:54,720Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:54,720Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,808Z    tap - INFO HTTP request to "messages" endpoint took 0.064s, returned status code 200
2020-11-30 23:49:54,917Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [6] at offset None
2020-11-30 23:49:54,917Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:54,917Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:54,917Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:54,917Z    tap - INFO Synced Stream: messages, page: 75611, 7561000 to 7561100 of total records: 42224
2020-11-30 23:49:54,917Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:54,917Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:54,983Z    tap - INFO HTTP request to "messages" endpoint took 0.052s, returned status code 200
2020-11-30 23:49:55,136Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:55,136Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:55,136Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:55,136Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:55,136Z    tap - INFO Synced Stream: messages, page: 75612, 7561100 to 7561200 of total records: 42224
2020-11-30 23:49:55,136Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:55,136Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:55,196Z    tap - INFO HTTP request to "messages" endpoint took 0.041s, returned status code 200
2020-11-30 23:49:55,311Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [4] at offset None
2020-11-30 23:49:55,311Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:55,312Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:55,312Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:55,312Z    tap - INFO Synced Stream: messages, page: 75613, 7561200 to 7561300 of total records: 42224
2020-11-30 23:49:55,312Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:55,312Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:55,371Z    tap - INFO HTTP request to "messages" endpoint took 0.043s, returned status code 200
2020-11-30 23:49:55,494Z target - INFO Serializing batch with 9032 messages for table messages
2020-11-30 23:49:55,542Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [7] at offset None
2020-11-30 23:49:55,542Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:55,542Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:55,542Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:55,542Z    tap - INFO Synced Stream: messages, page: 75614, 7561300 to 7561400 of total records: 42224
2020-11-30 23:49:55,542Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:55,542Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:55,625Z    tap - INFO HTTP request to "messages" endpoint took 0.051s, returned status code 200
2020-11-30 23:49:55,887Z target - INFO Sending batch of 2057045 bytes to https://api.stitchdata.com/v2/import/batch
2020-11-30 23:49:55,887Z target - INFO Sending batch of 2057110 bytes to https://api.stitchdata.com/v2/import/batch
2020-11-30 23:49:55,897Z target - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [1] at offset None
2020-11-30 23:49:55,897Z target - INFO replicated 9032 records from "messages" endpoint
2020-11-30 23:49:55,909Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [3] at offset None
2020-11-30 23:49:55,910Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:55,910Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:55,910Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:55,910Z    tap - INFO Synced Stream: messages, page: 75615, 7561400 to 7561500 of total records: 42224
2020-11-30 23:49:55,910Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:55,910Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:55,972Z    tap - INFO HTTP request to "messages" endpoint took 0.047s, returned status code 200
2020-11-30 23:49:56,146Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:56,147Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:56,148Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:56,148Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:56,148Z    tap - INFO Synced Stream: messages, page: 75616, 7561500 to 7561600 of total records: 42224
2020-11-30 23:49:56,148Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:56,148Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:56,209Z    tap - INFO HTTP request to "messages" endpoint took 0.046s, returned status code 200
2020-11-30 23:49:56,383Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [9] at offset None
2020-11-30 23:49:56,383Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:56,383Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:56,383Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:56,383Z    tap - INFO Synced Stream: messages, page: 75617, 7561600 to 7561700 of total records: 42224
2020-11-30 23:49:56,383Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:56,383Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:56,435Z    tap - INFO HTTP request to "messages" endpoint took 0.037s, returned status code 200
2020-11-30 23:49:56,541Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [3] at offset None
2020-11-30 23:49:56,541Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:56,541Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:56,541Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:56,541Z    tap - INFO Synced Stream: messages, page: 75618, 7561700 to 7561800 of total records: 42224
2020-11-30 23:49:56,541Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:56,541Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:56,670Z    tap - INFO HTTP request to "messages" endpoint took 0.061s, returned status code 200
2020-11-30 23:49:56,849Z    tap - INFO [smart-services] event successfully sent to kafka: com.stitchdata.streamRecordCount [6] at offset None
2020-11-30 23:49:56,850Z    tap - INFO replicated 100 records from "messages" endpoint
2020-11-30 23:49:56,850Z    tap - INFO Stream messages, batch processed 100 records
2020-11-30 23:49:56,850Z    tap - INFO Write state for stream: messages, value: 2020-11-18T01:35:01.289Z
2020-11-30 23:49:56,850Z    tap - INFO Synced Stream: messages, page: 75619, 7561800 to 7561900 of total records: 42224
2020-11-30 23:49:56,850Z    tap - INFO URL for Stream messages: https://api.kustomerapp.com/v1/customers/search
2020-11-30 23:49:56,850Z    tap - INFO URL for Stream messages: customers/search?pageSize=100
2020-11-30 23:49:56,899Z    tap - INFO HTTP request to "messages" endpoint took 0.035s, returned status code 200
mascah commented 3 years ago

We just onboarded with Kustomer as well and I'm having the same problem. Forever stuck reading notes on 2020-12-10T18:25:09.473000Z. The integration has been running non-stop since we turned it on.

dRuEFFECT commented 3 years ago

We just onboarded with Kustomer as well and I'm having the same problem. Forever stuck reading notes on 2020-12-10T18:25:09.473000Z. The integration has been running non-stop since we turned it on.

I'd recommend either disabling the notes table or pausing the whole integration until this gets resolved. Keeping it running won't do anything but add row counts against your Stitch plan. I had this occur in several tables, most critically in Conversations, so I just turned the whole thing off.

I traced the issue down to the pageSize variable. Comments in the code suggest the pageSize variable should be set to at least 300 due to Kustomer sometimes reporting over 200 rows with the exact same timestamp, and Stitch logs show usage of pageSize=100.

I was told by Stitch the fix for this is not on the Stitch side, but somewhere in the code here that raises the pageSize variable to a user-configurable input and that "the author" of the integration is planning to debug these issues in a future sprint, however there's no ETA since the author is a community member and not a Stitch employee. Not very clear, but that's as much info as I have.

mascah commented 3 years ago

I traced the issue down to the pageSize variable. Comments in the code suggest the pageSize variable should be set to at least 300 due to Kustomer sometimes reporting over 200 rows with the exact same timestamp, and Stitch logs show usage of pageSize=100.

I was told by Stitch the fix for this is not on the Stitch side, but somewhere in the code here that raises the pageSize variable to a user-configurable input and that "the author" of the integration is planning to debug these issues in a future sprint, however there's no ETA since the author is a community member and not a Stitch employee. Not very clear, but that's as much info as I have.

I spent a minute trying to confirm if that pageSize is the suspect, but looking at the Kustomer API docs for customers/search endpoint (which the notes stream uses), it says the page and pageSize query strings have a min of 1 and max of 100. It's unclear to me whether setting it to 300 or above would work.

https://dev.kustomer.com/v1/customers/customer-search

mascah commented 3 years ago

I think I'm starting to grok this issue a little better.

In short, we're basically using the approach described on the API docs page

Pagination

As there is a hard limit on the page number of a 100 you may sometimes want page through more records than that. Maybe > you want to get an initial snapshot of data and then subsequently keep them it upto to date.

What we can do is write a query with criteria based on the updated_at field of the object. We can then make the next api call based on the last updated_at of the response.

If the last updated_at is the same for all the records in the response, then this logic is falling through...

https://github.com/singer-io/tap-kustomer/blob/13196749348789e818abb6471b42f75875a821a9/tap_kustomer/sync.py#L76-L81

It won't update the bookmark in this case if all the values have the same updated_at. Given that Kustomer did a bulk migration for us, it's highly likely that we have batches of records greater than 100 that have the same updated_at value.

I'm uncertain at the moment how we could update this logic given the API page and pageSize having hard limits at 100, and in theory there could be any arbitrary number of records with the same value for updated_at.

mascah commented 3 years ago

I spent some more time testing this out and it looks like there is some inconsistency in the Kustomer API docs.

They state the max pageSize is 100, but in practice it seems to actually be 500. If I request anything more than that, the response only contains up to 500 items.

I actually had to set it to 400 for it to get a result that had a greater updated_at then the rest of the records.

There's 2 problems here. The Kustomer API docs are either incorrect, or out of date. The other is that we can't assume or guarantee that a response will not contain records that all have the same updated_at value. In this case, we should probably be throwing an exception, because it's unclear to me at the moment if there's any way to move the bookmark forward if more than 500 records have the same updated_at.

mascah commented 3 years ago

To my current understanding, the logic of this tap today looks like this. The underlying assumption is that each bookmark (timestamp) value should have less than 100 (or 300) records.

I believe what's missing from this is a nested loop through the pages of each bookmark value. This would theoretically support a lot more data all with the same updated_at timestamp. So the logic of the tap would look like this now

Using the above approach, we'd probably need to put a LTE filter on the updated_at as well so that we only request small increments of time, instead of trying to get "everything greater than" the current bookmark.

mascah commented 3 years ago

I'd love to get feedback/validation from the author and or Stitch folks regarding the above. I would be more than happy to submit a PR for these changes if people think this is the right approach. I'll also try and spin up a conversation with our Stitch rep as well.

@dscoleman @KAllan357 @luandy64

asaf-erlich commented 3 years ago

@mascah The following would require testing to determine if it's really the root cause, but if everything you discussed above is correct I think the real root cause has to do with the logic around what value to save as the bookmark and how to filter records relative to it. I really don't think the page size changes much. If we're skipping records we are skipping records. Page size alterations may reduce the number of records skipped, but skips would still be able to happen.

In some cases, as you discuss above, the API itself supports the search simply specifying greater than or equal to a value. The customers stream is a good example: https://dev.kustomer.com/v1/customers/customer-search . So then there's no need to manually filter the data if the api does it for you. Somewhere in this code there is a bug:

                if bookmark_field and (bookmark_field in transformed_record):
                    last_dttm = transform_datetime(last_datetime)
                    bookmark_dttm = transform_datetime(
                        transformed_record[bookmark_field])
                    # Keep only records whose bookmark is after the last_datetime
                    if bookmark_dttm >= last_dttm:
                        write_record(stream_name,
                                     transformed_record,
                                     time_extracted=time_extracted)
                        counter.increment()

I actually think streams where the API does the filtering for you the bookmark value needs to be sent to the api but the if statement shouldn't be done on whether to filter out writing records. That would stop records from being skipped for those streams with a bookmark_query_field (in the https://github.com/singer-io/tap-kustomer/blob/master/tap_kustomer/streams.py file). I still think somewhere in there must be a bug, so for the others that needs to be root caused and still fixed. It may be related the code line you reference in the comments above, where the max value is chosen but if the data is not sorted maybe it shouldn't be saving the bookmark till all the pages are done processing? Again, further testing would be needed to confirm that.

I hope these suggestions help guide you in the right direction.