singer-io / tap-typeform

Singer.io tap for extracting TypeForm data
GNU Affero General Public License v3.0
11 stars 20 forks source link

Typeform hidden fields results are not populating or appending to referrer URL #47

Closed LindsayOllie closed 2 years ago

LindsayOllie commented 2 years ago

Hi!

The Typeform platform is properly populating the Hidden Fields after a respondent completes the survey. However, these fields are not appearing in the question or answer table and appending to the Referrer URL like I expected.

Typeform informed me this was a Stitch issue, not a Typeform issue.

Any idea how I can get these fields to appear?

bhawson commented 2 years ago

This seems to be an issue with the landings table and the fact that its not populating properly.

We have 22 forms going back to November 2020 with circa 130k form completions. Therefore my landings table should have circa 130k rows. one per landing_id. This is not the case and if I do a full (or new) sync via either stitch or meltano I only get about 550 rows in the landings table. There seems to be a 25 record limit on retrieving records to add to the landing table.

Some simple sql shows the correct numbers of form completions in the answer table, but with hundreds of thousands of rows missing (or failing to be updated daily or hourly) the details in the hidden column are not being accurately recorded.

I'm happy to submit both logs from stitch data and meltano for verification of the issue.

luandy64 commented 2 years ago

I only get about 550 rows in the landings table. There seems to be a 25 record limit on retrieving records to add to the landing table.

Looking at the tap, I can confirm that we are failing to paginate through all of the Landings response https://github.com/singer-io/tap-typeform/blob/efe6fe6af3db99808b7f42a83533bc4ce1a9b1e2/tap_typeform/http.py#L138-L145

luandy64 commented 2 years ago

This seems to be an issue with the landings table and the fact that its not populating properly.

I agree with this being the Landings Table that is affected. I cannot reproduce the missing hidden field though

INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.03997445106506348, "tags": {"endpoint": "https://api.typeform.com/forms/t6sbkCIz/responses", "status": "succeeded"}}
{
  "type": "RECORD",
  "stream": "landings",
  "record": {
    "landing_id": "vfrmj4own5u9v8lh5mauapvfrmj4owbw",
    "token": "vfrmj4own5u9v8lh5mauapvfrmj4owbw",
    "landed_at": "2021-10-14T18:47:58.000000Z",
    "submitted_at": "2021-10-14T18:48:12.000000Z",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",
    "platform": "other",
    "referer": "https://jokg5x1yup9.typeform.com/to/t6sbkCIz",
    "network_id": "d9060a8188",
    "browser": "default",
    "hidden": "{\"hidden1\": \"xxxxx\", \"hidden2\": \"xxxxx\"}"
  },
  "time_extracted": "2021-10-14T19:02:21.772289Z"
}

If you are comfortable with a REST client (Postman, Insomnia, curl, etc) I can walk you through reproducing this for yourself

bhawson commented 2 years ago

@luandy64 Thanks for looking into this.

I don't there is an issue with the hidden column, itself, in the landing table. What @LindsayOllie is saying that the hidden fields used in their forms are not appearing at source after the Stitch integration has executed and I think this is a symptom of the issue of failing to paginate through all of the Landings responses and not an actual data issue.

The other three tables have all the correct data (at least my tables do), but there is a mismatch between the total number of landing_id (with their responding hidden fields in the hidden column) in the landings table when compared to the total distinct number of landing_id in the answers table.

As you can see any Typeform still taking responses after tap_typeform 1.4.0 was released now has discrepancy of responses between the landings and answers tables.

Form ID     Opened      Last Updated    Landings    Answers
VPPt8nHU    2021-06-07  2021-10-13  2281        2616
aCe8eyrE*   2021-04-01  2021-10-13  61192       67734
p9X0hJrW*   2021-04-26  2021-10-12  8406        8533
dtL2s6R5*   2021-04-07  2021-08-09  10000       10000
cN4Jkmhw    2021-06-15  2021-08-06  288     288
i5qntZ1B    2021-02-05  2021-04-01  1158        1158
ANDbZenW    2021-02-05  2021-04-01  1625        1625
j6T3NcWj    2021-01-29  2021-04-01  2086        2086
HSCGgLTt    2021-01-29  2021-04-01  1416        1416
rfuR507P    2021-01-29  2021-04-01  562     562
ibzDumuO    2020-12-15  2021-04-01  7939        7939
UtMzXhzD    2021-02-11  2021-04-01  2423        2423
j62mfjCp    2021-02-05  2021-04-01  1151        1151
sua4Ldrb    2020-12-15  2021-04-01  4821        4821
SSPtf3dd    2021-01-27  2021-04-01  2706        2706
tmidrDpN    2020-11-26  2021-04-01  3763        3763
vhG9WSv1    2021-02-25  2021-04-01  1457        1457
aWa8TwFF    2021-02-11  2021-03-31  613     613
b0mQCHlk    2021-01-29  2021-03-31  1572        1572
KrwWVhKL    2021-01-29  2021-03-31  2151        2151
wrtx9Kde    2020-12-15  2021-03-30  2948        2948
DkxTznEB    2020-11-25  2021-02-15  2437        2437

The three forms with * next to the form name have hidden fields, so while I have all the answers from the responses for form p9X0hJrW, I'm missing 127 responses from the landings table, ergo the values from the hidden column too, which is likely what is also happening for @LindsayOllie.

@LindsayOllie, please feel free to correct me if I'm off the mark.

In my opinion if we can resolve the issue of failing to paginate through all of the Landings responses, this will fix my and @LindsayOllie's issue going forward, but a full sync will need to be done to retrieve the missing rows prior to the fix.

LindsayOllie commented 2 years ago

@bhawson Yes I believe that's is what's happening. Let's go forward with your recommendation and see if that fixes it

luandy64 commented 2 years ago

Thank you for the explanation. Opening this issue created a card in Stitch's backlog, but I'm going to update the description and make it a pagination issue

LindsayOllie commented 2 years ago

@luandy64 Thank you for your help! Where can I go to check the status of this issue?

luandy64 commented 2 years ago

@LindsayOllie We just deployed a fix for this in v1.4.2. I believe you will want to reset the integration's bookmark for this table.

In Stitch, I think there's a button for that?