We are seeing segment faults that are closely linked to the evaluation metrics map.
Customer shared logs to confirm the evaluation metrics map has stored corrupt data.
Theory
Theory is when we clone the evaluation metrics map, the evaluation thread can and will modify the original map concurrently, resulting in data corruption. Theory extends to the Segment fault being caused by this access too.
Also, the cf_client uses a class variable which is not thread safe, which could be causing issues.
Fix
Lock the cloning operation with a mutex
Replace the client class variable with a Singleton
What
Theory
Theory is when we clone the evaluation metrics map, the evaluation thread can and will modify the original map concurrently, resulting in data corruption. Theory extends to the Segment fault being caused by this access too.
Also, the cf_client uses a class variable which is not thread safe, which could be causing issues.
Fix
Lock the cloning operation with a mutex Replace the client class variable with a Singleton
Testing