webiny / webiny-js

Open-source serverless enterprise CMS. Includes a headless CMS, page builder, form builder, and file manager. Easy to customize and expand. Deploys to AWS.
https://www.webiny.com
Other
7.31k stars 603 forks source link

Support for DynamoDB - create Commodo driver #662

Closed adrians5j closed 3 years ago

adrians5j commented 4 years ago

This is:

Detailed Description

Since a noticeable interest for implementing DynamoDB support was received by the community, I decided to create a dedicated issue, that would enable us to stay in sync regarding the status of this feature request.

The following things need to be finished in order to create the Commodo DynamoDB driver:

The best to get started with this is to take a look at the existing fields-storage-mongodb lib. In theory, replacing the MongoDB client with the DynamoDB client should be the way to go, but of course, there might be some other issues here.


Note that commodo uses the mongodb style syntax for querying, for example:

const users = await User.find({ query: { type: { $ne: ["admin"] } } });

The driver should be able to at least know how to work with basic logical query operators and comparison query operators. Once it receives the MongoDB style operator, it should translate it internally into the DynamoDB operator. We've done this before with MySQL, so if you need an idea how to organize this, just let me know, I'll dig it up for you.


commodo supports search operators, you use it like this:

await SimpleModel.find({
            search: {
                query: "this is",
                fields: ["name", "slug"]
            }
        });

With MongoDB, internally we just translate this into a simple $regex search. I know I know, not super scalable. We'll handle this once there'll be a need for fixing this. Additionally, the last solution I'm about to mention below might resolve this too.

Back to the DynamoDB...

Unfortunately, DynamoDB does not support full-text searching. To my understanding, the common way to address this issue is to use ElasticSearch or another similar service. So I'm not sure if we're going to be able to implement this feature in this driver.

Maybe we could try with the CONTAINS operator for starters, and see where it'll go from there? I am aware that this involves scanning the tables, which can become expensive. The alternative is just to use BEGINS_WITH operator, where indexes can be used.

~If neither of these will work, the last thing that came to my mind is to create a separate @webiny/serverless-search-catalog, which would deploy ElasticSearch, and we would refactor all models (that can be searched), and make them use this component, so all searches are executed against this service, efficiently. We can expand on this once we get here, but I pretty much have everything in my head already regarding this option.~


Obviously, we want to have basic test coverage. No need to go for the 100% for the initial version, but at least 60%-80% would be nice.


When it comes to using indexes in find and findOne methods, the initial idea was to pass an additional index parameter which should specify which index to utilize. But by doing it that way, we would have to go over every individual app and make sure the index property is passed into all find / findOne methods. And also, some other refactor would be needed as well (e.g. upgrading GQL CRUD resolvers).

But this can be avoided with the following approach.

Instead of specifying which index to use, on a per-model basis, the developer will just specify which indexes are available. This way, the DynamoDB driver will be able to analyze the provided query and specified indexes and come up with a decision on whether a Query or Scan should be used.

To specify available indexes, we could create a separate withIndexes HOF, that could look like the following: image

Note: it's up to the developer if this will work and if there are other things to be added here. We can discuss it along the way.

You can create the HOF here (along with two helpers we'll discuss shortly): image

Internally, when the user uses the HOF, it will just save the object into the __withIndexes internal property, using withStaticProps from the repropose package. A good example to take a look at is the withName HOF (packages/name/src/withName.js). It uses the useStaticProps to add the __withName property to the function. As mentioned, in your case, you can name the property __withIndexes and push everything the user defines into it.

Once you have that, you'll also have to create a helper function getIndexes that will return a list of all specified indexes. With that, create hasIndex, which you'll then utilize in the DynamoDB driver, and decide whether a Query or Scan needs to be called.

VividWombat commented 4 years ago

we may need to evaluate other options first. DocumentDB might be easier/better, at least as a first choice

https://docs.aws.amazon.com/documentdb/latest/developerguide/what-is.html

it mentions:

https://www.mongodb.com/compare/mongodb-dynamodb

adrians5j commented 4 years ago

Actually, we did try DocumentDB in the past.

There are a couple of reasons we didn't end up using it.

First, in order to connect to it, you need to put a function into a VPC which dramatically increases cold starts. I'm talking 10seconds cold starts here.

The next thing is price - there is no free tier and even the cheapest instance is not cheap for a regular developer, that just wants to quickly spin up Webiny in his account and try it.

And finally, although not super critical, there are a few aggregation operators that are still missing in DocumentDb. The main one I'm thinking about here is the $facet operator, which we use in the commodo driver.

In the end, I think that we're going to stick with the MongoDB Atlas for now. Unless of course a greater interest for DocumentDB is shown from the community. I would also give more priority to DynamoDB implementation, since more users have asked for this.

@VividWombat

adrians5j commented 4 years ago

So far nobody from the team had the chance to work on this, but if anyone from the community would like to take a stab at this, take it away :)

P.S. You can count on our support for any related issues / questions.

asktree commented 4 years ago

I think that Cloud Firestore could be a good/better fit. Disclaimer: I'm not a DynamoDB expert. Features include:

I cant speak to the aggregations available, wouldn't surprise me if they weren't great. It's also probably painful to use it with Lambda. For me using GCP is a priority but I expect this isn't common.

adrians5j commented 4 years ago

Cloud Firestore sounds interesting, but at the moment, to implement that, we'd also have to create good GCP support. The Cloud Firestore implementation itself is probably going to be a solid amount of work, not mentioning the full GCP.

I think that if we were to choose between the two, DynamoDB would be higher on the priority list, simply because we're already so embedded into the AWS cloud.

And yeah, GCP support is something we are exploring, but it's still very early phase (I wouldn't even call it a "phase"), meaning it won't come very soon.

jfgrissom commented 4 years ago

Hi,

I'm a Senior Software Engineer on the Cloud Engineering team at Intuit, Inc.

I've received approval for some time to work on this project. (About 5 hours a week).

I can think of many ways to access DynamoDB for use with Webiny. I'm happy to build this out, write the tests, and put together some documentation for this.

I'll look around for a chat channel, review the docs, fork the repo, and install webiny to get a sense of what it's doing.

Do you have any other recommendations for me to help out with this?

Thanks, Jay

SvenAlHamad commented 4 years ago

Hi @jfgrissom, welcome to Webiny!

It's great to hear that your company supports open-source contributions, the whole community needs more of that and we're super happy you decided to pick our project.

This task, in particular, is one that has been requested by many members so it's gonna have a great impact on the wider project. That being said, we want to provide you with all the support you might need to be successful.

At the moment we also have an internal team member planning to do some work on it, so it might be good to sync our efforts. Are you maybe available for a video call, or you can potentially reach out to us on Gitter (https://gitter.im/webiny/webiny-js)?

Cheers, Sven

p.s. my email is sven@webiny.com in case you need to drop me a direct message.

jfgrissom commented 4 years ago

Very nice. I shot you an email to say hi.

I'll connect with your team on Gitter once I get familiar with the project.

jfgrissom commented 4 years ago

I'm wrapping up some items and it looks like I will do some exploration this afternoon.

I should be able to connect with you on gitter tomorrow (Mar 25th).

adrians5j commented 4 years ago

Just updated the issue, and made clear which tasks @ostappartyka is already working on.

UPDATE: it seems the majority of these tasks are related to the commodo driver, and it's best to leave @ostappartyka to handle it completely, since he already started working on it.

The last two tasks listed here are the only ones that are not in his domain, and could be resolved by somebody else if possible.

I would give priority to the first one (related to DDB tables creation). If anyone is interested in tackling this section, give us a ping, we'll be glad to help.

jfgrissom commented 4 years ago

I can help out with the 2nd to last item on the list (create table definitions for every).

ianvonholt commented 4 years ago

First, in order to connect to it, you need to put a function into a VPC which dramatically increases cold starts. I'm talking 10seconds cold starts here.

Is this still a concern even after this announcement?

I'd really like to get VPC's up and running specifically so we can tie in caching for Apollo.

Pavel910 commented 4 years ago

@ianvonholt We haven't tried it since; you could try and see how it performs. Let us know if you need help with anything.

samj commented 4 years ago

Actually, we did try DocumentDB in the past. There are a couple of reasons we didn't end up using it.

This isn't my first time searching for a serverless CMS, and it may not be the last, but I wanted to give some feedback on this issue in particular as I was ready to deploy webiny right up until I ran into the "Update the MONGODB_SERVER variable" direction after running create-webiny-project.

Now there are no "serverless" police, so I'm not here to debate the purity of having a hard server dependency in a serverless environment, but there are pragmatic reasons why people want truly serverless environments. While "DBaaS" services like AWS RDS and MongoDB Atlas do delegate the server maintenance to someone else, the usual scalability, security, availability, etc. challenges are still there in ways that don't apply to scale-out architectures.

There's also the cost factor, which you mentioned above – the managed server model behind DocumentDB starts at hundreds of dollars a month for servers that need to be up even if you're only serving the occasional user from a large data set, rather than many users from a small data set that this approach is optimised for.

Given the interest in these threads, and the deleted (but cached) article below, I'm clearly not alone. I hope that the DynamoDB adapter you're working on isn't a red-headed stepchild, rather first-class citizen, and possibly the future default for a truly serverless project that can sit for years without running up costs, while still scaling as and when required. I need something now, so I'm going to continue on, but I'll check back in here as this really is some great work you've done here.

Why MongoDB and not DynamoDB?

Webiny suggests you use MongoDB Atlas. It's a managed service so you don't worry about servers, maintenance, and similar overhead. Alternatively, you can also use AWS DocumentDB which is MongoDb-compatible managed database service.

But why not DynamoDB?

DynamoDB is a great database, with many amazing features. However, Webiny has one goal that clashes with us being able to use DynamoDB and that is vendor lock-in.

Webiny aims to support multi-cloud deployments and as such we cannot use a technology that can only be deployed on a single cloud provider. DynamoDB is an example of such technology since it's only available on AWS.

However, we expected those questions and requests to support different types of databases, and because of that reason the code inside Webiny doesn't interact directly with the database driver, instead, there is an abstraction layer. This abstraction layer allows developers to write adapters for other databases without the need to modify any code inside the Webiny core.

So if you want to use DynamoDb, all you need is an adapter for it. This applies to any other database.

If you're interested in writing an adapter. Have a look at this github issue that provides already some guidelines. And for any other question, just give us a shout by opening a new topic on our repo.

Last updated on 11/22/2019 by SvenAlHamad

Pavel910 commented 4 years ago

@samj thanks for the input, we do appreciate all the interest as it helps us prioritize our development efforts. We're a very small team, 3 months ago we only had 3 people working on Webiny, you can imagine the amount of work :)

Ideally, we'd love to support DynamoDB by default. Mongo was (still is) an ok solution for its simplicity. But it does introduce setup friction and we agree - is not entirely serverless. For us the biggest pain point with Mongo is that it doesn't have a HTTP API, so we have to deal with connection management problems. The Serverless Databases space is very young and there are still so many things lacking.

Our goal is to eventually be deployable to multiple clouds, and DynamoDB is introducing a serious lock-in. So as you can imagine, being a small team, aiming for multi-cloud, and not having much experience with Dynamo - Mongo was a good choice for the time being.

I must say that we're not working on Dynamo integration at this point, we do plan to, but it's not our first priority at the moment.

Thanks for you comment, it's a +1 for DynamoDB, and we need that information! 👍

samj commented 4 years ago

@Pavel910 and thank you for the prompt and detailed response — at the end of the day it's your project and prerogative to assign your resources how you want, and I totally understand that you already have a solution that works for you.

To be fair, if not for the context of this particular project (building a truly serverless platform for NoOps) I might well have been tempted to go down the MongoDB path, and I may yet take a look at DocumentDb (though it sounds like I'll lose functionality like facets and anything that depends on that, as well as perhaps search?).

Actually this is the second time this week I've had the same problem, the other being with form.io. Best of luck with it all, and thanks for your open source contributions. I'll keep an eye on this thread and your project.

SvenAlHamad commented 4 years ago

Hey @samj - appreciate the input in this. We do want to get DynamoDb support into Webiny, and especially because we want it to be a “first-class citizen” we haven’t moved forward with it yet as there are big challenges in how DynamoDb works vs any other NoSQL database. We just haven’t found a solution that will be of great quality and won’t jeopardize the current features of the system.

This issue is flagged as “help-needed” particularly for that reason - we need input from people that have worked with DynamoDb and know it well to point us in the right direction. We can do all the code stuff, it’s the architecture in particular that we need help with.

If you have experience with DynamoDb and are willing to spend some time on this, we would love to have a chat.

samj commented 4 years ago

I've had a look at Commodo with a view to assigning developer/s to it, and it looks to be more of a shim around MongoDB than an abstraction layer like ODBC, which is obviously going to cause problems when it comes to migrating to different databases. It's already impressive that you managed to get it over to MySQL from a document database, but that has a lot of the capabilities (e.g. CONTAINS/LIKE) that you would need, and as @doitadrian has already discovered, the fallback is expensive table scans or e.g. ElasticSearch. That said, things like versions would be handled very well using sort keys.

I would think that to do this properly would require work outside the database driver, and ideally a design decision from the outset which translates to vendor lock-in if you're not careful. To your point about "how DynamoDb works vs any other NoSQL database", it's more analogous to Google BigTable and Azure Cosmos DB, and a decision for example to just support that style of database and simple queries like "get content for id" (which can still be very powerful with clever design) would clearly have made this easier to achieve today.

There may be a design decision to make here in terms of either moving to this more "serverless" approach to databases, or sticking with the document model and resulting cluster costs, but hopefully there's middle ground to be had somewhere. I for one would be willing to sacrifice search (i.e. make it optional and dependent on e.g. ElasticSearch) to have something that costs nothing to idle, but can cost-effectively scale as required.

Pavel910 commented 4 years ago

@samj we basically decided to use MongoDB API syntax, and adapt other databases to it, as it is pretty clear and not SQLish, and SQL queries can be easily generated from it. So yes, Mongo and SQL works nicely.

We've seen that clever design video many times when getting to know DynamoDB quirks, I've also read the whole https://www.dynamodbbook.com/ to get myself familiar with the beast.

Thanks for your time and input, we'll discuss this further as we're very interested in using a truly serverless database and reduce the setup complexity caused by an extra DB cluster (Mongo Atlas, etc).

If you have more information/experience to share, we'll be really grateful! 🍻

samj commented 4 years ago

What about Cassandra-style wide column databases like Amazon Keyspaces, which also have no monthly running costs outside usage?

Pavel910 commented 4 years ago

No experience there :( We only have experience with more traditional databases like MySQL, PostgreSQL and MongoDB. These we used for years and know what we can do with them. For other databases we need help.

adrians5j commented 4 years ago

BTW @samj, just wanted to mentioned one thing, related to Commodo itself.

The goal for it wasn't to universally support all databases but to provide a way to quickly execute simple / medium-complex queries. This was definitely successful for us as we have a couple of base apps, and all are basically running on simple queries, except maybe one or two cases, where we applied some data restructuring to make it possible.

In own apps, if needed, users can freely use the database client directly for all other complex querying, that's actually encouraged as it's more efficient.

samj commented 4 years ago

No experience there :(

Neither I'm afraid — Cassandra is a black box to me, but it sounds closer to MongoDB than DynamoDB.

appsyslab commented 3 years ago

@samj we basically decided to use MongoDB API syntax, and adapt other databases to it, as it is pretty clear and not SQLish, and SQL queries can be easily generated from it. So yes, Mongo and SQL works nicely.

We've seen that clever design video many times when getting to know DynamoDB quirks, I've also read the whole https://www.dynamodbbook.com/ to get myself familiar with the beast.

Thanks for your time and input, we'll discuss this further as we're very interested in using a truly serverless database and reduce the setup complexity caused by an extra DB cluster (Mongo Atlas, etc).

If you have more information/experience to share, we'll be really grateful! 🍻

I am a SW architect at a large asset manager. My team just built a fully serverless SPA app using dynamodb fronted by the API gateway. I would be willing to help answer questions or brainstorm around possible approaches. Basically, what we need is a good design for a primary key and a sort key based on the items that are stored and how best to shard them. I am planning to install webiny with mongodb over the weekend and will look at the type of items stored in the DB if it helps.

webiny-bot commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

unixfox commented 3 years ago

bump

Pavel910 commented 3 years ago

Update for everyone in this thread: We're soon releasing v5 of Webiny, which is based on DynamoDB so I'm closing this issue, as there won't be a driver for Commodo, instead we're introducing a simple DB client layer (with drivers), with Dynamo being the default setup.