Closed mdrichardson closed 4 years ago
Disregard that issue reference ("[DCR] Allow for Slack Block Kit API"). Bad copy/paste
@johnataylor I updated this above:
Update: I believe I've found a better temporary workaround that is easier to use and uses partitioning:
Delete your Container Create a new Container with partitionKeyPath set to /id Remove the PartitionKey property from CosmosDbStorageOptions
And I believe this might be the easiest quick-fix to implement. In CosmosDbStorage
, if we create the database and/or collection/container, we'd just need to use /id
instead of this.partitionKey
. For READ/DELETE, we just use the realId
instead of this.partitionKey
. If we only do this on db/collection creation, that shouldn't break backwards-compatibility.
Michael, I'm running with botbuilder-js libs, I've tried the above solution and DELETE still does not work. Is there an ETA for when the actual fix to cosomosDbStorage will release?
@garypretty Please take a read through this. I have some opinions, as does @mdrichardson. Let's get a agree on a solution that works with the CosmosDB Partition Keys and see where we end up.
@cleemullins @mdrichardson I have taken a look at this and here is my take based on the great insight so far by @mdrichardson (CosmosDB isn't a familiar area for me so your clear explanation above was great!).
Solving the issue for new / currently broken bots
Largely, moving forward, I echo comments above, in that I think we should take the PartitionKey out of the hands of the developer for storage of bot state. I also agree that if the developer wants to store anything else in Cosmos DB then let them do it and handle it in whichever way they want to.
The approach of using the RealId as the partition key would seem to be a suitable approach. I was a little concerned that this might lead to some sort of scale issue (not knowing cosmos) due to the number of partitions that would lead to. However, having done some further reading I don't think this will be an issue - there is a specific allowance for large numbers of partitions which we can take advantage of if we use v2 of the PartitionKeyDefinition when creating our container. (This is now also the default option when creating containers via the Azure portal).
var documentCollection = new DocumentCollection
{
Id = _collectionId,
PartitionKey = new PartitionKeyDefinition
{
Paths = new Collection<string> { "/realId" },
Version = PartitionKeyDefinitionVersion.V2,
},
};
To take advantage of the above change we would need to update our dependency on Microsoft.Azure.DocumentDB.Core to >= 2.4.1. We are a way behind anyway right now on 2.1.1, with the latest release being 2.7.
Maintaining Compat
Based on what I am seeing, there has to be very few people who are using Cosmos storage and getting it to work correctly, whilst also using a PartitionKey. Therefore, we could decide to simply ignore the PartitionKey set by the user, but this would obviously be a breaking change for anybody who has managed to get things working.
The other option, which maintains compat. is to use the customer defined PartitionKey if they have set one and mark the PartitionKey property as obselete with a comment explaining that it should not be set (potentially with a link to a readme providing additional context for people who do have working bots?), as opposed to removing it.
Here is the diff based on the above
@cleemullins great to hear your thoughts and @mdrichardson please feel free to clarify anything I may have misunderstood :)
@mdrichardson @cleemullins Thinking more about it, one key problem with the above, if I understand correctly, is that whilst it will address net new users of CosmosDb storage and also help people who want to use partition key, it will break the experience for customers who already use CosmosDb storage without a partition key being set - where the storage provider has been working well for them until now.
Perhaps we could add an optional param to the CosmosDbStorage class (or new prop on the options class) to enable / disable the use of the PartitionKey - default this to false so that existing implementations work. However, in all our docs / samples we can advise for net new bots that the flag should be true, to provide the best experience.
We would need to ensure docs / release notes inform people not to enable this flag if their bots were in place ahead of this change, but it does solve the problem of backwards compatibility, whilst still allowing new implementations to proceed on the right basis.
Thoughts?
@garypretty CC: @cleemullins
Gary, I agree with pretty much everything you said. I've got some comments on a couple things.
To take advantage of the above change we would need to update our dependency on Microsoft.Azure.DocumentDB.Core to >= 2.4.1. We are a way behind anyway right now on 2.1.1, with the latest release being 2.7.
If we end up making a separate class for using PartitionKeys (which I'm not necessarily recommending), we may as well update to v3. The hesitation on doing this before was that C# V3 didn't support non-partitioned DBs, breaking backwards-compat. I think it does now, but haven't looked into it enough to be sure.
Perhaps we could add an optional param to the CosmosDbStorage class (or new prop on the options class) to enable / disable the use of the PartitionKey - default this to false so that existing implementations work.
I think this is the cleanest approach, from the customer side. My only (slight) concern with doing something like this is that it might get sort of hacky and hard to maintain on the SDK development side (I haven't thought through what the code would look like, so this may be avoidable). A separate class would avoid this entirely. I don't like the idea of having two classes so similar, but it's definitely a trade-off that should be considered.
@galenguilbert No ETA. As you can see, we're still trying to figure out the best way to tackle this. DELETE isn't really used within the SDK, so I'm not terribly surprised it doesn't work. After looking at the code, I'm not sure why it doesn't work, since it uses PartitionKeys similarly to READ. I have 3 workarounds listed in the top comment, all of which should feasibly work, although they operate pretty differently. Have you tried all 3?
@mdrichardson agree that a second class is an option - in fact I had written that up as a second option and then deleted it, because I would love to avoid it if possible. I think having the flag should be pretty clean and a small iteration on the above, at least right now if we wanted to get a fix in using v2. We should just need to check the flag when applying the PartitionKey to the queries.
Totally understand what you are saying about moving to v3 if we are going to update the dependency, but upgrading to later in v2 will require much less work and will at least allow us to be confident in getting a fix for Cosmos into 4.6.
@mdrichardson I ended up following the pattern for the third solution and just adopted the c# modifications to the typescript file. We're using the delete for when a user has a privacy request to remove their information from our system, we scrub the transcript and state data for any of their conversations.
i hope this can be fixed soon...
@debasreedash The C# SDK has merged a PR with this. You'll need to use the new CosmosDbPartitionedStorage
. We're hoping to have the Node and Python versions out by the 4.6 release.
@mdrichardson sorry but our bot is in nodejs
@debasreedash I'm hoping to have that out this week, but realistically, it might be a couple of weeks before the PR gets done, approved, and merged. Then, you'll likely need to wait for the 4.6 release, unless you want to build the Node SDK yourself (vs. downloading via npm).
@mdrichardson okay cool, we can do the update of the bot framework in a few weeks then and also i'm following this bug
We are going to close this master issue, as we now have issues / PRs raised in the relevant language repos. @debasreedash @galenguilbert you should be able to track your desired language on the links below.
JS - https://github.com/microsoft/botbuilder-js/issues/1234 C# - https://github.com/microsoft/botbuilder-dotnet/pull/2613 Python - https://github.com/microsoft/botbuilder-python/issues/327 Java - https://github.com/microsoft/botbuilder-java/issues/124
Relevant customer issues:
Issue
This issue applies to both/all SDKs and is as much of a design problem as it is a bug. Basically, if a user defines a
PartitionKey
inCosmosDbStorageOptions
, the majority of bots will not be able to read/delete/update state.The reason for this requires substantial explanation (again, this applies to all/both repos). I'll try my best to keep it brief:
The Way Cosmos Works
Containers (previously called Collections) are created with a
PartitionKeyPath
, which might be something like/data/city
.Let's say we have an item:
When an item is added to the collection, Cosmos decides which physical partition to put it on by the value of
item.data.city
. So, all items withitem.data.city === "Bellevue"
, get put on the same partition.Writes to the DB don't need
partitionKey
specified because it's already defined 1) on the container, and 2) on the item.Reads and Deletes, however, benefit GREATLY from defining the
partitionKey
. If we sayREAD item 123 with partitionKey Bellevue
, it will only spin up the physical drive with theBellevue
items when it searches for item 123. This is especially true when querying for multiple items with the samepartitionKey
since they'll all be on the same partition, by design.How our SDK works
We pass in our
partitionKey
throughcosmosDbStorageOptions
.When writing, we do nothing with the
partitionKey
. ✔When reading, we attempt to read with the previously-set
partitionKey
🚫When deleting, we attempt to delete with the previously-set
partitionKey
🚫Reading and deleting isn't implemented properly in the Bot Framework because, to use the previous example, we're either:
Setting
partitionKey
to"/data/city"
: when we READ or DELETE, we're trying to query with"/data/city"
instead of"Bellevue"
, orSetting
partitionKey
to"Bellevue"
: when we READ or DELETE, we can never find any items withcity.data === "Seattle"
Why This Wasn't Caught In Tests
Easy mistake and I'd imagine it comes from the confusion around
partitionKeys
andpartitionKeyPaths
. They're confusing and it's taken me a decent amount of digging to wrap my head around this.The tests that deal with partitions set the
partitionKeyPath
, then define a single item with the "right" key value and then look for that specific item, with its known key value of "Contoso" instead of thepartitionKeyPath
we used earlier of/city/location
, which is how our SDK would actually work (or we'd have multiple items with different cities and only be able to pull up the "Contoso" ones).Temporary Workaround
If you're using Cosmos and running into this issue, you can "fix" this by:
/_partitionKey
PartitionKey
property fromCosmosDbStorageOptions
Update:
I believe I've found a better temporary workaround that is easier to use and uses partitioning:
/id
PartitionKey
property fromCosmosDbStorageOptions
Temporary Workaround With Partitioning
If your data is likely to go over 10GB, you may run into an issue using the above workaround because I believe Cosmos will put every document onto the same partition and each partition is limited to 10GB of data. To actually use partitioning, you can take the following steps (this is written for C#, but the concepts apply to Node):
PartitionKey = "realId"
.Here's the diff
Note: This should still be considered a workaround and not an officially-supported change
Proposed change
The problem with coming up with a good solution is that currently, our READ and DELETE operations both only accept
key
(which isn't anything more than anid
), so there isn't a good way to intuit the appropriatepartitionKey
.The best solution that I can think of has two parts:
CosmosDbStorage
for non-BotState things, allow them to usepartitionKeys
however they want. They know best.CosmosDbStorage
for BotState things, disallow custompartitionKeys
. Instead, sincekey
is already passed in, we use/realId
as thepartitionKeyPath
(key
==realId
) and then we can usekey
/realId
as the `partitionKey for READ/DELETE. This goes against the conventional thought behind partitioning, but actually works well for CosmosConversationId
. There's no use of this currently in our SDK, but if we were ever going to query multiple documents at once, it would likely be within aConversationId
.That being said, I don't love this solution and hope we can come up with something better.
Component Impact
CosmosDbStorage
CosmosDbStorageOptions
IStorage
Customer Impact
Would allow for actual partitioning and prevent users from running into errors when trying to use partitioning with BotState.
Tracking Status
Dotnet SDK
Javascript SDK
Java SDK
Python SDK
Emulator
Samples
Docs
Tools
[dcr]