googleapis / google-cloud-datastore

Low-level, Protobuf-based Java and Python client libraries for Cloud Datastore. Check out google-cloud-java and google-cloud-python first!
https://cloud.google.com/datastore
Apache License 2.0
215 stars 133 forks source link

Data loss issue(Reverting back to previous state) #253

Closed krishnagangula closed 2 years ago

krishnagangula commented 4 years ago

Data loss issue

We are facing data loss issue from the past few weeks. The records are saved properly in the datastore. After a few days, some records revert back to the previous state. We have a background job, which gets the data directly from the datastore and resaves it. At this point of resaving, some data is reverting back to the previous state. We are not sure from where this previous state is coming from as we are directly getting the data from datastore itself. We are using Objectify to perform operations on datastore. `

  Queue queue = QueueFactory.getDefaultQueue();
  DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
  String backendAddress = BackendServiceFactory.getBackendService().getBackendAddress("memdb");
  TaskOptions task =
      TaskOptions.Builder.withUrl("/indexcase").param("caseStatusId", caseStatusId.toString())
          .param("indexAll", "1").method(Method.GET).header("Host", backendAddress);
  Transaction txn = ds.beginTransaction();
  queue.add(task);
  txn.commit();

`

We are maintaining logs which keeps the track on any operations done on the entity. CaseDataLossScreenshots

In the above screenshots, Case status was updated to "Spec Cases PD current" on 09/25/2020. But, it got reverted back to "Spec Cases 1-140 App Waiting".

pcostell commented 4 years ago

Hi @krishnagangula --

There are a couple of things that could be causing this:

  1. Eventual Consistency. If you are using Cloud Datastore (vs Cloud Firestore in Datastore Mode), queries are eventually consistent. This means that you may see query results that show the stale previous state. Performing a lookup directly against the key (e.g ofy().load().key(myKey)) will return the most up to date value. If your background job loads data via a query, it could be reverting the state.

  2. Global Cache. Objectify offers Global Caching as a feature. The cache should remain consistent with the underlying data, however it will not see out of band changes. For example, if your entity update does not use the global cache, but the background job does, it is possible that the background job is loading the old version of the entity from the cache and then resaving it into Datastore.

krishnagangula commented 4 years ago

Hi @pcostell In the background job, we are neither using query nor cache to get the data.

`

 CaseDO caseDO = ofy().load().type(CaseDO.class).id(caseId).now();

// Add category to case according to case status.
if (caseDO.getCaseStatus() != null) {
  Ref<CaseStatusDO> refCaseStatus = caseDO.getCaseStatus();
  CaseStatusDO caseStatusDO = refCaseStatus.get();
  if (caseStatusDO != null) {
    Ref<CaseCategoryDO> refCaseCategory = caseStatusDO.getCaseCategory();
    // Add the reference to the category only for indexing/search purposes.
    caseDO = CaseDO.getBuilder(caseDO).withCaseCategory(refCaseCategory).build();
  }
}

// Initialise the internal status change date which is used for sorting on internalStatusChangeDate.
// caseDO.initCaseInternalStatus();
if(internalStatus != null) {
  if (!Strings.isNullOrEmpty(internalStatus.getStatusMsg())) {
    internalStatusChangeDate = internalStatus.getDate();
  } else {
    internalStatusChangeDate = null;
  }
}

// Generates search indexes for 12 fields in one indexed field.
caseDO.processSearchIndexes();

ofy().save().entity(caseDO).now();

`