Closed kaizenlabs closed 1 year ago
Even when it reports the jobs successfully completed dataflow doesn't really delete the entities. I run several jobs, some were completed successfully some were not (even if the data filled was exactly the same) but the entities from datastore were not deleted. Totally worthless!
.......
2020-05-02 01:49:41.149 EESTEstimated size bytes for the query is: 392976
2020-05-02 01:49:41.150 EESTSplitting the query into 12 splits
2020-05-02 01:49:41.232 EESTSuccess processing work item cw-nabu-tv;2020-05-01_15_48_22-17973458269990428245;3323607320616377713
2020-05-02 01:49:41.233 EESTFinished processing stage s02 with 0 errors in 2.421 seconds
2020-05-02 01:49:43.383 EESTStarting MapTask stage s04
I think you are correct in that the Dataflow solution wasn't perfect — it has since been marked as "Deprecated" here: https://cloud.google.com/dataflow/docs/guides/templates/provided-utilities#datastore-bulk-delete-[deprecated] and the suggestion in the docs is to use Firestore Bulk Delete instead: https://cloud.google.com/dataflow/docs/guides/templates/provided-utilities#firestore-bulk-delete If you are using one of the Datastore client libraries, there are code samples for batch operations here: https://cloud.google.com/datastore/docs/concepts/entities#batch_operations (which includes a batch-delete code sample).
Batch deleting with Datastore is abnormally difficult.
Trying to clear Datastore of all entities and since the recommended method of using DataFlow errors out all the time (both myself and another developer tried it separately, same result), I just wrote my own script in Go to loop over all the records and delete in batches of 250 (the max I could crank it up to without getting an RPC error of 'transaction size' too big...not even close to the 500 entities limit specified in the documentation).
Issue is, the script ran all day yesterday (see screenshot below) and I received no errors from the DeleteMulti function call. The script iterated through all 227,000 entities and then stopped.
Today (24-hours later), I check Datastore and all the entities are still in there. When I click on some of them, it says the entity was recently deleted but it is still showing in the admin console.
I tried to rerun the script and the script I noticed is returning the GetAll query entities with the same exact Keys as the previous DeleteMulti iteration: they're still being returned even though they were deleted.
How long does it take Datastore to sync? From the UI to programmatic queries nothing is up to date even after 24 hours it seems. My code is very simple and should work, nothing really to debug as I'm getting no errors from Go.
NOTE: I changed the i := 1 to i := 0 shortly after this screenshot in the for loop. Even still, it shouldn't matter.
Any help would be appreciated.