Open PotatoSpud opened 5 years ago
hi, what's your about the strategy to generate unique IDs from classnames?
This is not perfect as the hashs may not be completely unique. However, the chances of a clash are very low. So I added a new version of TypedKryoStrategy and GlobalKryoStrategy as follows:
public class IndigoTypedKryoStrategy<T> extends KryoStrategy<T> {
private final Class<T> clazz;
private final UserSerializer userSerializer;
public IndigoTypedKryoStrategy(final Class<T> clazz, final UserSerializer registrations) {
this.clazz = clazz;
this.userSerializer = registrations;
}
@Override
public void registerCustomSerializers(final Kryo kryo) {
this.userSerializer.registerSingleSerializer(kryo, this.clazz);
}
@Override
void writeObject(final Kryo kryo, final Output output, final T object) {
kryo.writeObject(output, object);
}
@Override
T readObject(final Kryo kryo, final Input input) {
return kryo.readObject(input, this.clazz);
}
@Override
public int newId() {
return HashUtil.serializionIdHash(this.clazz.getName());
}
}
public class IndigoGlobalKryoStrategy<T> extends KryoStrategy<T> {
private final UserSerializer userSerializer;
private final int id;
private static final String GLOBAL = "global";
public IndigoGlobalKryoStrategy(final UserSerializer registrations) {
this.userSerializer = registrations;
String identifier = GLOBAL;
try {
final Type sooper = this.getClass().getGenericSuperclass();
final Type t = ((ParameterizedType) sooper).getActualTypeArguments()[0];
identifier = t.getTypeName();
} catch (final Exception e) { /** fall through */
}
this.id = HashUtil.serializionIdHash(identifier);
}
@Override
public void registerCustomSerializers(final Kryo kryo) {
this.userSerializer.registerAllSerializers(kryo);
}
@Override
void writeObject(final Kryo kryo, final Output output, final T object) {
kryo.writeClassAndObject(output, object);
}
@SuppressWarnings("unchecked")
@Override
T readObject(final Kryo kryo, final Input input) {
return (T) kryo.readClassAndObject(input);
}
@Override
public int newId() {
return this.id;
}
}
The MurmurHash3_x86_32 algo was lifted from Hazelcast itself but any decent hash would do the work:
public class HashUtil {
public static int serializionIdHash(final String text) {
final byte[] bytes = text.getBytes();
int hash = HashUtil.MurmurHash3_x86_32(bytes, 0, bytes.length);
// Avoid Hazelcast's internal registrations and our own space
if ((hash > -400) && (hash < 100)) {
hash += 500;
}
return hash;
}
}
Hope this helps Aongus
@PotatoSpud: I am not crazy about the probabilistic nature of this. It smells like a birthday paradox to me - the chance of a conflict increases quite fast as the number of classes is growing.
Is there any better way? Maybe a strategy with hard-coded IDs for well-known classes (think of JDK classes) and then a combination of:
Any other idea?
@jerrinot: Agreed, it is bound to create problems using my above approach.
For your suggestions:
If you can get away from explicit ID(int) assignment, you are half way home. I understand more clearly how the serialization works, it is the de-serialization that is confounding. The class names must be presented again I assume so that subsequent IDs can be understood.
Using SubZero across different Hazelcast clusters has problems. The serialization ids used on one cluster will not be consistent with that of another. I have got around this issue by creating my own KryoStrategy's that use an id that is generated from the fully qualified classname.
It gets tricky for two reasons though:
I can attempt a fork, if there is an interest in this solution.
Best regards Aongus