Closed mcwhittemore closed 7 years ago
OK. This is getting pretty close. I think metadata
and adding bulk
back is all that is left before doing some benching with cardboard-hammer
.
Might want to add tests to make sure that both tables are getting cleaned out when a feature is deleted.
meh... good batch is going to be hard to write. The more I think about this, the more I think we should either drop batch (bad idea) or always do batch operations. Having two main files that write to two tables conditionally does not sound very maintainable. For now, I'm going to go with stubbing batch to call the single feature methods many times. Once we've confirmed that this is fast enough and scales well, I'll start converting to a 100% batch api.
/cc @rclark
Yesterday, @rclark suggested that we break the writes to the main-index
away from the writes to the list-index
(called, features and search in this PR).
@rclark is suggesting this because we cannot guarantee that two writes happen two different dynamo tables. If we first write to main-index
and then write to list-index
one of them fails and the other succeeds, we can't guarantee that the needed delete will work. And if this needed delete fails we are stuck in a situation where our indexes don't agree with each other. This state would always break the list action and would sometimes break the get action.
@rclark's suggestion is to move the list-index
write to a DynamoDB stream handler. DynamoDB streams guarantee order and does not expire an event until it is handled successfully (or 24 hours pass), so the write to the list-index
will try over and over again if something is wrong with that table.
The main problem I see with going to this method is that it will make deleting all of a datasets data complicated. At best you'd need a stream handler on list-index
the made possible noop requests to the main-index
.
Another option is to suggest users of cardboard use an out of sync delete process. This would be some external task that looks at for metadata
records in cardboard and compares them to a master database that is not managed by cardboard (cardboard doesn't currently manage a master list of dataset). By finding datasets that have metadata
records but are not in the users master database they would know that dataset needs to be cleaned up.
I think cardboard's modules would all need to offer a clean up function. This doesn't get around the problem of needing cardboard.list
to delete the features in main-index
, but it does start to solve that problem.
Anyway, this needs more thought but I do agree that streams solve the guarantee problem so we should continue to understand these extra problems. My biggest fear is that moving to streams forces a bunch of overhead.
In talking with @rclark, he pointed out that we can avoid two streams by always deleting from the main-index
first. Let's call not seeing that option 👶-brain.
Replaced by #186
The goal of this PR is to resolve hot partition problems as outlined in https://github.com/mapbox/cardboard/issues/184.
TODO
Things changed
geobuf@3.0.0
which requires node@4.5.0createTable
tocreateTables
and removed name overrides.list
with no callback tolistStream
. If we want to keep the old functionality, we can havelist
calllistStream
, I just wanted to clean up the code paths in that function.batch.remove
tobatch.del
to match the non-batch api