oliviertassinari commented 1 year ago

Related to #592

I was hoping I could rebuild most of the mini-apps that I had created on the staging environment https://master--toolpad.mui.com/ in the demo environment: https://demo.toolpad.io/ but it's not the case. This makes me wonder:

Why not have the default fetch query? Right now, when I click on adding a new query, the select is empty.

Why not have a fetch function? If we are worried about the amplification network power that Toolpad can provide to bad actors, then we could limit the number of fetch calls that can be made inside the function. I think that we would also need to limit the number of queries that can be added to the page, because why close one door (fetch) when the other is open (multi queries)?

Why not have PostgreSQL? If there is a good reason for it, I think that we should allow people to pick it and then explain why it's not available. Currently, it feels like the feature doesn't exist.

Why not have Google Sheet? Same to 3.

prakhargupta1 commented 1 year ago

I agree, the Functions and Fetch should be available by default like we show in the staging environment. We had purposely limited the data sources as the intent was to give a basic idea of Toolpad. Should we disable Connection button and through an info icon show that we support 2 more data sources in the self-hosted version? We can show Fetch default, Functions default and Fake movies API directly when user will go to create a query. Alternatively, we can support all data supports if there are no major concerns.

Also, as there is no Home view, users will create 3-4 apps in the first session itself (until they realize they can't see the apps). Either we should show apps for that session or an info saying please save this URL if you want to open this again.

We can also show Self-host CTA in the header bar.

Janpot commented 1 year ago

Why not have the default fetch query? Right now, when I click on adding a new query, the select is empty.

Queries run serverside without any rate-limiting, meaning it's quite straightforward to turn a public facing toolpad instance into a DDOS or scraping bot. render.com will kick us off its servers faster than we can say "it wasn't me" 🙂. There also isn't any outbound firewall, are there any render.com internal urls that shouldn't be exposed to the public?

I will implement https://github.com/mui/mui-toolpad/issues/1196 to provide a similar experience that doesn't have the same problems.

Why not have a fetch function? If we are worried about the amplification network power that Toolpad can provide to bad actors, then we could limit the number of fetch calls that can be made inside the function. I think that we would also need to limit the number of queries that can be added to the page, because why close one door (fetch) when the other is open (multi queries)?

I can run curl in a loop on my laptop to the Toolpad data endpoint and no limit to the amount of fetch calls in a function or queries on a page will protect against that. Finding out which url to call is quite straightforward since the source is freely available.

I will implement a client-side variant of function query analog to https://github.com/mui/mui-toolpad/issues/1196 that can be used in the demo mode.

Why not have PostgreSQL? If there is a good reason for it, I think that we should allow people to pick it and then explain why it's not available. Currently, it feels like the feature doesn't exist.

Currently passwords are plaintext readable in the editor (not the runtime). We will reimplement connections (https://github.com/mui/mui-toolpad/issues/1065) with the consideration in mind that plaintext passwords should never end up on the browser.

Why not have Google Sheet? Same to 3.

Same to 3, the token is stored the same way as the connection details

We had purposely limited the data sources as the intent was to give a basic idea of Toolpad.

The intent was avoiding cloud related security and performance issues so that we can concentrate on the app building experience instead.

Either we should show apps for that session or an info saying please save this URL if you want to open this again.

If we do that we should add very clearly that the app can disappear any moment in time.

edit: Happy to make these datasources available in the demo if the team doesn't share these same concerns.

bytasv commented 1 year ago

Could we add a rate limiting on the server side? Say to 5 requests a second or any other value that feels right and wouldn't allow for ddos?

oliviertassinari commented 1 year ago

Also, as there is no Home view, users will create 3-4 apps in the first session itself (until they realize they can't see the apps)

@prakhargupta1 On my end, I didn't feel pain about this. I can see all its features with a single app. I did create a lot of apps because I couldn't find the previous ones back, but I guess it's OK? To fix this we would need to have a login, at which point, we have a Cloud which makes the Demo legacy 😁.

We can also show Self-host CTA in the header bar.

It would improve the UX 👍

Queries run serverside without any rate-limiting

@Janpot Ohhhh, I didn't realize there are only two APIs that are allowed:

Ok, then I think it simplifies the framing of the problem this GitHub issue is about. Say I'm a developer and I have a problem to solve, the demo helps me envision how the tool could solve my pain without doing the local installation, but if I have a concrete pain, and I would like to try how the tool solves it, I can't see this. Maybe it's OK like this but I think that we could do a lot in terms of improving the UX.

Currently passwords are plaintext readable in the editor

Then for 3 and 4, I problem for me is that as a developer it feels like these data source options don't exist. I think that the data source option should be visible and: either explain why it's disabled, for security reasons, or (better, if it doesn't take too much time to implement) allow the person to acknowledge the risk and use dummy credentials/accounts.

Could we add a rate limiting on the server side? Say to 5 requests a second or any other value that feels right and wouldn't allow for ddos?

@bytasv This might require a Redis instance or equivalent? If it's too much work, I think that solving the UX part of this GitHub issue could be a great alternative. What I mean by UX, is: I land on the tool, I see very few fetching options and I jump to the conclusion: it's not ready for my use case. It's also about, when I click on "Add query", I see an empty select: so I can't fetch data?

Janpot commented 1 year ago

To be honest, I'd look into client side fetch and function datasources. It will give a good idea on what's possible, and the requests happen fully on the browser side, so our IP addresses are not tainted. And it will be valuable outside the demo as well

This might require a Redis instance or equivalent?

Just to note some constraints: I believe we need to rate limit on outbound IP address (perhaps combined with appId). This because we don't control hostname resolution. It will also have to take into account redirects.

This means we'll have to use an agent and pass it to all node-fetch calls that are user originated. It's the only way to get access to this with node-fetch.

Our options are:

Write an agent that rate limits in-memory: good for single instance setup. Won't work predictably once we start horizontally scaling.
Write an agent that rate limits using redis to synchronize across instances. Probably not the most efficient as it has to do at least one roundtrip to redis before every request
Personally, I'd first try out deploying nginx as a forwarding proxy with rate limiting on render.com and configuring node-fetch in the Fetch and Function datasources with a proxy agent. I expect this to be most performant and robust and barely requiring writing any new code.
The previous option can also be written in node.js if that's preferred

bytasv commented 1 year ago

@bytasv This might require a Redis instance or equivalent?

I don't think it has to be so complex. In the past I've used https://www.npmjs.com/package/express-rate-limit it's pretty straight forward.

Found a blog post that explains how that could be done for next/server - https://kittygiraudel.com/2022/05/16/rate-limit-nextjs-api-routes/

All of that should work with in-memory handlers, but IMO it shouldn't be an issue at least for demo. I might be wrong, but it seems that all we would need is to provide keyGenerator which determines how the rate limit is applied, there is even an example of using IP address, IMO we could give it a try, might be the easiest option, if we can't get it working, we could look into other options.

prakhargupta1 commented 1 year ago

Regarding the UX improvements. I would suggest that we do these changes:

In the connections drop-down, just like we show in staging env, we should show:

Function
fetch
Postgres (disabled)
Google Sheets (disabled) For disabled ones we show a message that they are currently available in the self-host version only.

Add query should not be a blank list. Like we show in the staging environment, we should show:
- default fetch
- default function
- Movies API
- Covid API
- dog API
Self-host button on top.
When creating the app, show the message that 'In case you need to revisit the app later please save the app URL'

apedroferreira commented 1 year ago

Personally, I'd first try out deploying nginx as a forwarding proxy

I like this option, I think any of the 3 would work right now but it seems like the most future-proof, as long as it does not take a lot more time/effort. And rate limiting is definitely something we should add to improve security overall. If we're going for speed, in-memory rate limiting as Vytas said should work for now as we probably don't plan to have multiple instances in the short-term?

If we're rate limiting data source calls only/mostly, I guess we could use appId as the key not to have to rely on the IP address, which is a bit more unreliable?

About the UX, I like the list Prakhar made + allowing only client-side fetch and functions. If there is no preselected option either for creating a new connection or a query, we could also use placeholder text that makes it more clear that the user should pick an option.

apedroferreira commented 1 year ago

We could also store (in localStorage, for example) the appId for the last app a user worked on, and when they open the main page, if that app still exists, show a popup with something like: "You were working on this app last time. Do you want to go back to it?"

Not sure if we want to do that but just an idea.

Janpot commented 1 year ago

Just want to point out that I'm proposing rate-limiting outbound requests through a forwarding proxy on the fetch calls that the backend is making and @bytasv is proposing rate-limiting inbound requests by wrapping the handler that serves data calls from the frontend. Which are two whole different things.

I believe there could be some benefits to rate limiting outbound requests over inbound:

We are limiting at the "right location". Nothing more needs to be done. By limiting inbound calls, we still need to be vigilant about what the handler is doing downstream. For example, we will still have to add additional limiting on the amount of fetch calls a function datasource can execute in one script. This will become harder to deal with once we support queries calling other queries, as they will all need to look at the same rate limiting budget.
In my first comment I made note of being able to blacklist specific hosts from being fetched from our servers. This would be straightforward to add in a forwarding proxy. If an abuse report comes in, it will be important to be able to react quickly. (I've seen such abuse reports off and on while working on a distributed web crawler, you want to stay friends with your hosting provider 🙂 )
It works just as well for one instance, as well as for 100 instances. The inbound rate limiter will need a way to synchronize across instances/serveress functions once we try to horizontally scale. I believe using redis for this will be slower and lower in bandwidth than using a proxy.

Anyway, some food for thought.

bytasv commented 1 year ago

I think this might be bit extreme of what we actually need for demo purposes. Initial assumption was that we COULD potentially get bad actors trying to abuse fetches, so the inbound rate limiting should take care of that for this demo purposes. I don't think we are currently looking for a way to have a bullet proof solution for a future cloud version? Even if we do, we could still try to do the fast a cheap way so that demo doesn't feel so limited and in parallel continue with setting up whatever is needed with the option that you proposed (unless that is even faster/cheaper to do than the rate limit middleware)

oliviertassinari commented 1 year ago

Ok, so if I summarize the options maze, we have by the level of quality of result and often cost, for each problem:

Credentials leak, for PostgreSQL and Google Sheet data sources A. Stop leaking the credentials B. Listing the data source in the UI, adding a checkbox to have the developer acknowledge the risk C. Listing the data source in the UI but disabling it with a tooltip to explain why it's not there
DDOS risk: for fetch and function data sources A. A rate limit on outbound requests. I don't think that a rate limit on inbound requests works because, in a function data source, we could make hundreds of requests, so we would need to disable this option, which is equivalent to B. B. Listing the data source in the UI but disabling it with a tooltip to explain why it's not there
I'm sold, what's next? A. How a "self-host button" banner. This banner could maybe be used with the full version/cloud for showing important messages, so it might not be a lost effort. B. Only improve the docs, relying on the link to the docs
How do I find my app back? A. Store (in localStorage, for example) the appId for the last app a user worked on, and when they open the main page, if that app still exists, show a popup with something like: B. Show the message that 'In case you need to revisit the app later please save the app URL'

I'm OK with all these options B/C. If we have the resources to do more (A), then it's also great.

Janpot commented 1 year ago

Adding as per last meeting: 2C: Add client/server switch to fetch and function queries as a new feature. For the demo mode disable the "server" option with a message to direct to the self-hosted version for this functionality. The reasoning being that client-side only would be good enough to showcase queries, but have no DDOS potential as they originate from the user IP address. 3 birds with one stone: new feature in core, more powerful queries in demo mode and no serverside code executed by the demo.

And a note on rate-limiting that a in-memory solution wouldn't work well if we need to scale horizontally

prakhargupta1 commented 1 year ago

Regarding the UX changes we discussed. My textual inputs for the pop-up below. Styling isn't apt.

MUI Toolpad App -> Toolpad app
reCAPTCHA text was bold and big earlier
Option to open last app added

We have to add Self-host button at the top, I guess it can be best placed on the right side, just left of 'all changes saved icon'. Another point, To some 'Demo version' may appear like a button, maybe we can alter its appearance.

1254

apedroferreira commented 1 year ago

Regarding the UX changes we discussed. My textual inputs for the pop-up below. Styling isn't apt.

All of those seem like good suggestions to me - i think we can do them all.

apedroferreira commented 1 year ago

@prakhargupta1 When a user creates a new blank app in the demo, should there already be connections for the Covid API and the Dogs API?

prakhargupta1 commented 1 year ago

I was thinking that we should just show them in the queries drop-down. So if a user doesn't create a Fetch connection, they'll still be able to see Covid API, Dogs API and Movies API in queries.

oliviertassinari commented 1 year ago

Based on the current state of https://demo.toolpad.io/ and my first comment https://github.com/mui/mui-toolpad/issues/1213#issue-1423157215

Is solved
Is solved
As a developer, I can't see anywhere that PostgreSQL is supported in the non-demo version. Should we care?
As a developer, I can't see anywhere that Google Sheets is supported in the non-demo version. Should we care?

Based on https://github.com/mui/mui-toolpad/issues/1213#issuecomment-1293691728

Same question
Is solved
Is solved
Is solved

apedroferreira commented 1 year ago

Based on the current state of https://demo.toolpad.io/ and my first comment #1213 (comment)

Is solved

Is solved

As a developer, I can't see anywhere that PostgreSQL is supported in the non-demo version. Should we care?

As a developer, I can't see anywhere that Google Sheets is supported in the non-demo version. Should we care?

Based on

Same question

Is solved

Is solved

Is solved

Oh, it didn't occur to me that by hiding connections this would not be showing anymore:

What if we showed every option as self-host only? As the client-side fetch/function don't really use connections. But I guess that could be a weird UX, maybe there's a better way...

oliviertassinari commented 1 year ago

What if we showed every option as self-host only? But I guess that could be a weird UX, maybe there's a better way.

@apedroferreira This sounds good. I think that it's only a strange UX because we can't create client-side fetch and function connections (it's only server-side), but once we support this, it would be OK?

apedroferreira commented 1 year ago

What if we showed every option as self-host only? But I guess that could be a weird UX, maybe there's a better way.

@apedroferreira This sounds good. I think that it's only a strange UX because we can't create client-side fetch and function connections (it's only server-side), but once we support this, it would be OK?

Yep, once we have server-side fetch/functions we can show that dropdown with only Postgres and Google Sheets disabled like in the image above.

oliviertassinari commented 1 year ago

@apedroferreira But to be clear, eventually, we will have these options? (Likely no longer in a select as it doesn't scale to many data source)

apedroferreira commented 1 year ago

@apedroferreira But to be clear, eventually, we will have these options? (Likely no longer in a select as it doesn't scale to many data source)

I wasn't thinking about separating client/server side fetch and functions in that dropdown, so that's why the UX would be a bit weird if I disabled every option for now... But I also don't think we should separate client and server side connections.

It is possible to create client-side connections already in the connection selector if I show them, but the problem is that client-side fetch and functions don't use any of the connection parameters such as base url, headers and authentication. Those parameters don't do anything in client-side queries - which means that creating a client-side connection instead of using the default fetch/function queries is useless...

~~I still think that showing the 4 original options, but all of them disabled with "Self-host only" might be the best choice here.~~

apedroferreira commented 1 year ago

@oliviertassinari Wait, actually Jan is working on global connections so we're eventually going to get rid of this connection picker anyway.

Maybe we can keep the demo as it is, and once we have global connections we can somehow show to demo users that we support those options in the self-hosted version, somewhere in the page where global connections can be created?

oliviertassinari commented 1 year ago

OK, not an ideal outcome in the short term, but we eventually reach the outcome we looked for, so why not, especially if it saves us time overall.

prakhargupta1 commented 1 year ago

@apedroferreira Yeah, it didn't occur to me as well, that by hiding Connection we are also hiding that info. For now, We can show a message in the footer.

Let's append the current message with. "Note: PostgreSQL, Google sheets are supported in the self-hostable version."

This will be followed by Self-host CTA. So I think it should go fine.

apedroferreira commented 1 year ago

Let's append the current message with. "Note: PostgreSQL, Google sheets are supported in the self-hostable version."

We can do that. Let me know if it would also still be useful to show the connections picker for now with fetch, function, postgres and google sheets all disabled and with "Self-host only". It could also be added quick as a temporary measure if you think it's a good enough alternative.

prakhargupta1 commented 1 year ago

If we don't show Connections at all and rather show the 2 data-sources as disabled in the Create query drop-down, would that be also be a quick fix? Reference screenshot:

Janpot commented 1 year ago

We will be moving connections globally, which will make them impossible to include in the demo as it currently is. Not until we add authentication and workspaces anyway.

prakhargupta1 commented 1 year ago

Ok. Let's then add the message in the footer. That should be fine.

mui / toolpad

https://demo.toolpad.io/ data fetching feels limited #1213

1254