Closed yevgenypats closed 1 year ago
@yevgenypats thanks for reaching out! I think this could be interesting, we'd definitely like to learn more. Some questions that I have include:
Since I imagine you might not want to answer these questions (especially the second one) on a public forum, feel free to reach out to me privately via Twitter DM (@plaiddev) or email (ahoffer@plaid.com).
Hi @phoenixy1, that's great to hear!
Maintaining the plugin is relatively straight forward and consist of two of things:
Following up also via email! Thank you 🙏
Hi everyone 👋 You can find the initial version of the Plaid plugin here: https://github.com/cloudquery/cq-source-plaid
Please let us know what you think
Thanks!! I'll take a more comprehensive look later, just a couple of thoughts off the top of my head:
Hi @phoenixy1, sorry for the late reply. Thank you for the thoughtful feedback. I'll go over the issues you mentioned and follow up
It would be good to be more explicit in the documentation that the means of getting an access token in the README is only suitable for testing and that clients will need to build out a hosted frontend UI (including update mode, OAuth redirection, etc. etc.) and a backend token exchange flow (that doesn't involve copying and pasting the token from the client into the plugin) if they want to use real user data in a Production environment.
It would be good to describe exactly which Plaid APIs / endpoints are supported in the README. I see you don't have identity and I might suggesting adding that one as it's pretty popular.
Based on a quick glance at the code, I'm not sure whether this will work in a way that satisfies your users' use cases for some of the more dynamic products like Transactions, where business logic is required to construct an accurate picture of what's going on in a user's account, so that might be something to think about. I don't know enough about how your plugin is used to make a definitive statement on this, however.
When you test, I would definitely try running this in Development and making sure the plugin is robust to things like new fields being added or API calls erroring out, this happens not infrequently in real-world environments.
Regarding API calls erroring out, we have retry logic in place: https://github.com/cloudquery/cq-source-plaid/blob/b5bde008364d7a79853f5a053f206efa339dc53d/client/client.go#L52 Anything we should add to the default policy?
As for new fields, would those be reflected in an update to the Go client? If so those should get added once we update the dependency. Each destination has logic to handle migration of existing tables, though some changes might be breaking (e.g. removal of a field, type change). We release those as a new major version of the plugin.
Thanks again for the feedback and would love to hear additional thoughts
It would be good to be more explicit in the documentation that the means of getting an access token in the README is only suitable for testing and that clients will need to build out a hosted frontend UI (including update mode, OAuth redirection, etc. etc.) and a backend token exchange flow (that doesn't involve copying and pasting the token from the client into the plugin) if they want to use real user data in a Production environment.
- Can you explain this a bit more. I don't have enough context into how Plaid works in Production. I thought generating an access token is a one time thing as it's long lived and can be used to query data later on. What's the reason a hosted frontend UI is needed with OAuth is required? Can't the example app provided can be used to generate a long lived token? If not, can you please point me to some docs regarding how to do it, so I can link from the README?
Yeah, once you have an access token, you're for the most part good. However, at most banks, if the user ever changes their password, the access token breaks and the user needs to go through Link again using something called update mode.
But the big problem is the process of getting the access token -- unless the use case for Plaid is a hobbyist use case where a developer is just building something for their own personal use, the person putting in the credentials into Link is the end user of the app, NOT the developer. Unless the developer is hosting link, the customer doesn't have any way to access Link (because I assume you're not asking your end customers to run a self-hosted app and then email you the access token or something), and it's not a good security practice to expose the access token client-side. So if you were doing this in real life, you as the developer would need to have a server your end user could go to, log into link, get a public token, and then have that public token be exchanged for an access token on your server.
The OAuth stuff comes into play mostly on mobile devices, but basically for banks that use oauth-based connections the developer sometimes has to do some extra work to get Link working properly on mobile since there's a redirect to the bank website during the link flow.
It would be good to describe exactly which Plaid APIs / endpoints are supported in the README. I see you don't have identity and I might suggesting adding that one as it's pretty popular.
- Great point, I made that more clear in feat: Add Identity cloudquery/cq-source-plaid#5 and added Identity in fix(docs): Make it more clear what resources are supported cloudquery/cq-source-plaid#6 (for some reason I couldn't get it to work initial, but tested it again and it does work).
Based on a quick glance at the code, I'm not sure whether this will work in a way that satisfies your users' use cases for some of the more dynamic products like Transactions, where business logic is required to construct an accurate picture of what's going on in a user's account, so that might be something to think about. I don't know enough about how your plugin is used to make a definitive statement on this, however.
- Usually the information will be saved in a database (e.g. Postgres), so as long as all the information is available users can query it based on their needs. Is there data that is missing that we should add?
So the way the Plaid Transactions works is that it's a subscription product where you pay for monthly access to transactions. Most customers of the transactions product will be calling the transactions endpoints daily / multiple times a day, and the logic around the endpoint will need to make sure to fetch only new transactions that haven't been seen before. However, I think it's fine if the business logic to reconcile transactions is on the customer as long as they are expecting that.
When you test, I would definitely try running this in Development and making sure the plugin is robust to things like new fields being added or API calls erroring out, this happens not infrequently in real-world environments.
- Regarding API calls erroring out, we have retry logic in place: https://github.com/cloudquery/cq-source-plaid/blob/b5bde008364d7a79853f5a053f206efa339dc53d/client/client.go#L52 Anything we should add to the default policy?
- As for new fields, would those be reflected in an update to the Go client? If so those should get added once we update the dependency. Each destination has logic to handle migration of existing tables, though some changes might be breaking (e.g. removal of a field, type change). We release those as a new major version of the plugin.
Thanks again for the feedback and would love to hear additional thoughts
Yes, new fields will be reflected in an update to the Go client.
Sorry, I should also follow up that with regard to the retry logic, it's not uncommon for data sources (banks) to be down for hours / days at a time and then come back up, so you should expect that sometimes your API calls just won't work.
Thanks for the detailed explanation @phoenixy1!
Re-transactions, we currently get all of them, but by default we'll remove the stale ones from previous syncs (a sync is a single execution of the CloudQuery CLI). We also have support for incremental resources (i.e. getting data based on a cursor), but I haven't implemented it yet. So by default consumers of the data will not see any duplicates, and we can add support for getting only new transactions to improve performance.
Re-retry logic, if the API calls don't work an error will be reported and users can re-run CloudQuery at a later stage.
I think the main remaining challenge is the access token and understanding which is the person using CloudQuery, the end user or the developer setting up the frontend and backend. Would the following make sense (so I can document it):
Does that summarize your intention?
Does that summarize your intention?
Something in the lines of https://github.com/cloudquery/cq-source-plaid/pull/7/files
Would the following make sense (so I can document it):
- Developer sets up a production environment with a frontend client and a backend
- Users go to the frontend, authenticate with an institution to kick off token exchange flow
- Backend saves the access token(s) after the token exchange flow is done
- In a separate process/job developer runs CloudQuery using access tokens to get data
Does that summarize your intention?
Yes, exactly! There are some extra nuances for a more robust integration, but I think this describes the MVP / happy path flow pretty well.
OK, looking at the latest version -- I haven't tried to run it and I'm not super familiar with CloudQuery Source, but it seems mostly pretty reasonable as far as I can tell? Let me know if there's anything you have specific questions or concerns about.
Hi @phoenixy1 thanks for the additional review and follow up. I think we're good on our end if you're happy with the first version.
What we would hope to see next is a backlink and a short blurb from the Plaid docs to CloudQuery. Not sure what would be the best place to put it (https://plaid.com/docs/api/libraries/ or https://plaid.com/docs/api/ or whatever works for you).
Something in the lines for:
[CloudQuery Plaid plugin](https://github.com/cloudquery/cq-source-plaid) extracts data from Plaid and loads it into any supported CloudQuery destination (PostgreSQL, Snowflake, BigQuery, S3...).
Will that work for you?
doing some cleanup -- I forgot to comment here, but we added the requested linkback a while ago, so I'm closing this ticket!
Thanks for the comment @phoenixy1 and adding the backlink. For reference to people watching this issue, it's here https://plaid.com/docs/resources/#third-party-resources
Hi Team, hopefully this is right place to ask, if not, I'd appreciate if you can direct me.
I'm the founder of cloudquery.io, a high performance open source ELT framework.
Our users are interested in a Plain plugin, but as we cannot maintain all the plugins ourselves, I was curious if this would be an interesting collaboration, where we would help implement an initial source plugin, and you will help maintain it.
This will give your users the ability to sync Plain data to any of their datalakes/data-warehouses/databases easily using any of the growing list of CQ destination plugins.
Best, Yevgeny