pharo-nosql / voyage

Voyage is an object persistence abstraction layer for Pharo.
MIT License
33 stars 21 forks source link

objects and reversedObjects are out of sync after GC (randomly) #187

Open SabineMa opened 1 day ago

SabineMa commented 1 day ago

Brief description: There is a short moment which occurs randomly when the objects and reversedObjects dictionaries are not in sync (= do not contain the same objects). If, exactly in that moment, one of that object is saved in mongo, its data is duplicated.

Perhaps also one important point is, that if the cache is out of sync and one object is retrieved again from the database, it is not added newly to the reversedObjects. Then you have to rebuild the cache (remove all sessions, logoff all users in our case) which is only a bad workaround.

Long description:

This screen shows that normal case:

Bildschirmfoto 2024-10-17 um 18 48 44

This has a very bad side effect: If in this moment, when the cache is out of sync, one of this objects should be saved, voyage does not find the OID in the reversedObjects attribute and this leads to duplicating the data of this object in the database. It duplicates the object in Mongo which is very bad

Now more information and reproduction: It took me a long time to reproduce this and we had several chats in discord about this, @estebanlm . In the following I show how to reproduce this. But this is with my model and our configuration in my image and please tell me If you need more information or if you can not reproduce it with this in your image, @estebanlm

As we have one mongo database per customer, we do also have one VOCache per customer. The caches are administrated by RKARepositoryManager which holds one Repository per customer. RKAMongoRepository is a subclass of VOMongoRepository. it only defines another poolSize (2) and retrieving possibilities.

Bildschirmfoto 2024-10-17 um 19 11 00

In the following code snippet for reproducing the bug, I am doing the following:

  1. Resetting the repository manager to be clean.
  2. Then reading some objects from mongo.
  3. Hold them in a collection (they are referenced and so they are not GCd)
  4. read the cache
  5. wait a little bit
  6. do GC
  7. and now the cache is out of sync
  8. Sometimes it gets out of sync also without actively calling the gc. I think then the gc was just running by its own. Sometimes also the delay is not needed...randomly...
| theCollection |
RKARepositoryManager customerRepositories: WeakValueDictionary new.
theCollection := (RKASingleReceipt selectAllSingleReceiptsOfCompany: 'K1002432')  .
(RKARepositoryManager customerRepositories at: 'K1002432') cache.
(Delay forSeconds: 2) wait.
theCollection := nil.
Smalltalk garbageCollect .
(RKARepositoryManager customerRepositories at: 'K1002432') cache inspect.
Bildschirmfoto 2024-10-17 um 19 06 52

https://github.com/user-attachments/assets/c28bb64e-488e-4419-8f3e-706a6a902333

The objects which are missing in the reversed objects after gc are exactly that objects which are no longer referenced by any other object and then are gcd.

As far as our discussions in the past, we were saying that this should never occur (having not the same objects in objects/reversedObjects). In my case this leads to a duplication of the data because mongo then does not find the object in the reversed objects/does not find the oid/and is thinkin that the object is new. But it is not new.

btw. I think this was also in the old driver but interestingly did not occur that often. I am not sure if this is true because we do have this objects RKASingleReceipt new. Explanation: I have to go back a bit: until spring, we had a model that had a close link between the company, people and travel and receipts. The company was also always in the session. Therefore, the objects were always referenced from the moment they were read from the database. However, after taking your advice to heart @Norbert Hartl - very good advice, Norbert - that the design is not good for larger amounts of data, which is the case with us now, we have chosen a different design for the new expense report/individual receipts feature. The person no longer knows their single receipts. Only the single receipts know their persons but not the persons their receipts. This means that there are no longer many references to the individual receipts, which is why this problem occurs for me in the new area. But we already had it in the old versions. I have built extra workouts for this. Something like when the repo is out of sync logoff all users and build a new cache! So, i say my impression is that the bug occurs more often in the new driver but possible the bug more often occurs in the new model perhaps

Here to be complete the methods used above:

selectAllSingleReceiptsOfCompany: aCustomerNumber

    ^ (self repositoryNamed: aCustomerNumber)
          selectMany: RKASingleReceipt
          where: self queryDictNotNilPerson
          order: self sortOrderAscendingDict
          limit: 10000
          offset: 0
repositoryNamed: aString
    ^ self customerRepositories
          at: aString
          ifAbsentPut: [
          RKAMongoRepository repositoryForCustomerNumber: aString ]

This is my current setting and VM but I would say there was never a version where I did not have this problem (Pharo 6-Pharo 11)

Pharo 11

Virtual Machine

/Users/sabinem/Documents/Pharo/vms/110-x64/Pharo.app/Contents/MacOS/Pharo CoInterpreter VMMaker-tonel.1 uuid: 872d53e1-b3c1-0d00-86f3-f7200df24b96 Sep 14 2023 StackToRegisterMappingCogit VMMaker-tonel.1 uuid: 872d53e1-b3c1-0d00-86f3-f7200df24b96 Sep 14 2023 v10.0.7 - Commit: b66dd582 - Date: 2023-09-13 15:14:31 +0200

Pharo 10.0.7 built on Sep 14 2023 16:05:00 Compiler: Apple LLVM 12.0.5 (clang-1205.0.22.9) VMMaker versionString v10.0.7 - Commit: b66dd582 - Date: 2023-09-13 15:14:31 +0200 CoInterpreter VMMaker-tonel.1 uuid: 872d53e1-b3c1-0d00-86f3-f7200df24b96 Sep 14 2023 StackToRegisterMappingCogit VMMaker-tonel.1 uuid: 872d53e1-b3c1-0d00-86f3-f7200df24b96 Sep 14 2023

/Users/sabinem/Documents/Pharo/images/Pharo 11.0 - 64bit (oktober4)/Pharo 11.0 - 64bit (oktober4).image Pharo11.0.0 Build information: Pharo-11.0.0+build.726.sha.aece1b5473acf3830a0e082c1bc3a15d4ff3522b (64 Bit) Unnamed

SabineMa commented 10 hours ago

this is copied from discord because I think that the point that after having the cache mismatch one time, the fact that re fetching the object from the database is not adding it again to the cache put be a workaround fix if we can not find the real reason for the issue

https://discord.com/channels/223421264751099906/300255715337961474/1296758258454757406

Norbert Hartl — heute um 10:54 Uhr @Sabine Manaa I wonder what is the effect if the collections are not in sync. The difference between both collections should all be objects that are not referenced anywhere, right? So to me the question is: what is the reason an object you want to store is not in the collection? As you want to save it there is obviously a strong reference to that object and hence should not be one that is missing. How many objects are in the VOCache? We need to exclude the compaction of the cache. So I just started to think about this. But instead making the difference between the two collections responsible for the cause I would think about the GC maintaing the weak collection messes up with the hashing So when you check the collections you need to check if the object is not in the collection or if it cannot be found. Sabine Manaa — heute um 11:14 Uhr @Norbert Hartl interesting points. Thank you for thinking about it.

The "history" of the issue is, that I had and am having doubled data. But it rarely happens. I was searching for the reason. Then I put a halt in development in for exactly that point, when the "double saving" would start. Then I recognized that in this case, there ist always an unbalanced state in that two collections. This seemed to be related. Then I was asking esteban and he confirmed. The both collections have to include the same objects. You are right by saying that for saving there must be a strong relation to this object!

But we have to consider that, after having the "cache-mismatch", when that object which is only in the objects but not in the reversed objects, is re-fetched, then it is not added to the reversed objects. @Esteban Lorenzano perhaps this would be a workaround fix...

SabineMa commented 10 hours ago

After having an interesting discussion with @noha at discord, I am able do provide the code to demonstrate the doubling of objets in mongo:

Before, I see in MongoDB Compass this - having ONE receipt:

Bildschirmfoto 2024-10-18 um 12 13 48

| theCollection theDoubleCandidate theRepositoryForSaving theCustomerNumber |
theCustomerNumber := 'K1002467'.
RKARepositoryManager customerRepositories: WeakValueDictionary new.
theCollection := (RKASingleReceipt selectAllSingleReceiptsOfCompany: theCustomerNumber)  .
(RKARepositoryManager customerRepositories at: theCustomerNumber) cache.
(Delay forSeconds: 2) wait.
theCollection := nil.
Smalltalk garbageCollect .
(RKARepositoryManager customerRepositories at: theCustomerNumber) cache inspect.
theDoubleCandidate := (RKASingleReceipt selectAllSingleReceiptsOfCompany: theCustomerNumber)  first .
 theRepositoryForSaving := RKARepositoryManager repositoryNamed: theCustomerNumber.
theRepositoryForSaving save: theDoubleCandidate

after that code above which should re read the existing object in mongo, it is doubled:

Bildschirmfoto 2024-10-18 um 12 17 54

SabineMa commented 2 hours ago

Here comes the code to reproduce the issue from a clean Pharo 11 image.

  1. Start Image Pharo 11 -64bit (old stable) from launcher.

  2. Load Voyage

Metacello new 
    repository: 'github://pharo-nosql/voyage/mc';
    baseline: 'Voyage';
    load: 'mongo tests'.
  1. change the printOn: of VOCache
VOCache compile: '
printOn: aStream
    aStream 
        nextPutAll: ''Cache'';
        nextPut: $(;
        nextPutAll: objects size asString;
        space;
        nextPutAll: ''objects;'';
        nextPutAll: reversedObjects size asString;
        space;
        nextPutAll: ''reversedObjects'';
        nextPut: $).'

4.) create the test data - then we have 1000 TestPilot objects in the database

| theRepository |
theRepository := ( VOMongoRepository 
        host:  VOMongoRepository defaultHost 
        database: 'aaa').       
1 to: 1000 do: [:index | 
        theRepository save:  (VOTestPilot new name: 'Esteban', index asString ) ] .

Bildschirmfoto 2024-10-18 um 19 22 19

5.) Do some select and between Garbage Collection. Write the cache size in the Transcript while doing this.

| theRepository  thePilots thePilot theLaterDuplicated | 
Transcript clear.
theRepository :=  ( VOMongoRepository host:  VOMongoRepository defaultHost database: 'aaa') .
Transcript crShow: 'step 1', theRepository cache printString.       
thePilots :=  theRepository   selectAll: VOTestPilot .
Transcript crShow: 'step 2',theRepository cache printString.        
thePilot := thePilots first.
thePilots := nil.
Smalltalk garbageCollect .
Transcript crShow: 'step 3',theRepository cache printString.        
theLaterDuplicated := (theRepository selectAll: VOTestPilot ) last.
theRepository save: theLaterDuplicated .
Transcript crShow: 'step 4',theRepository cache printString.        

6.) the result is that we have 1001 objects in the database:
Bildschirmfoto 2024-10-18 um 19 24 51

Transcript shows this:

step 1Cache(0 objects;0 reversedObjects) step 2Cache(1000 objects;1000 reversedObjects) step 3Cache(1000 objects;1 reversedObjects) step 4Cache(1001 objects;2 reversedObjects)

This is the doubled object: Bildschirmfoto 2024-10-18 um 19 26 49

7.) Now drop the database, do everything again but without this line: Smalltalk garbageCollect . And everything is fine and we remain with 1000 objects in the database


Thinking loud now: when saving the object, the existing instance is serialized and in serialize: anObject description: aDescription the repository is asked if the object is new. For answering this, the VOCache looks in its reversedObjects and because the object is missing there, it does the wrong way. isNew: returns true

we could of course make a repair mechanism at this point and if the object is in the objects and not in the reversedObjects, add it back to the reversedObjects. But that would only be a workaround. It would be interesting to find out, why the cache becomes unbalanced

Better way is to think about this: reversedObjects is of type WeakKeyDictionary

Class: WeakKeyDictionary I am a dictionary holding only weakly on my keys. When one of my keys is garbage collected, the key->value association is removed from the dictionary. Internally I use WeakKeyAssociation objects that know myself as their container. WeakKeyAssociation are implemented as ephemerons and get mourned when their key is only known from itself. On mourn, the association asks the container to remove itself.

https://en.wikipedia.org/wiki/Ephemeron#:~:text=An%20ephemeron%20is%20a%20data,is%20about%20to%20be%20collected.

but here I am at the end of my knowledge @estebanlm