Closed pkolaczk closed 9 years ago
The bug is on the writing (dumping) side, because I can see the Set appear incorrectly as a list on the Python side as well.
Try setting Config.SERPENT_SET_LITERALS = true . It's set to false to be compatible with Python < 3.2, where set literals are not understood by the ast module, and will be serialized as tuples instead. See https://github.com/irmen/Serpent/blob/master/java/src/net/razorvine/serpent/Serializer.java#L231
Is it going to work with Python 2.7?
No. To use set literals you will have to use Python 3.2 or newer. This is a limitation of the ast module of python itself, hence the configuration setting to enable/disable this in Serpent.
Wait, I don't understand something. In Python 2.7, Sets seem to be supported, because the first set gets converted properly and converts back to either a Python set or Java HashSet. So it is possible. Yet, the second and all the next sets in the complex object are not. How does it know the first set should be unpicked to a set and not array or a list? The type information is definitely there.
Can you tell us why the first one gets converted ok and the rest do not, and why can't it be fixed in Python 2.7?
Right, if one set is converted and the other not, there's definitely something strange going on.
In any case, what happens when you set Config.SERPENT_SET_LITERALS = true ?
Also, pyrolite/serpent don't have built in support for Guava types, what happens when you replace the guava set type with regular jdk set ?
Finally, as I don't have a Scala compiler installed, could you please convert the code to a java equivalent that permits me to easily reproduce the issue?
This is a regular JDK HashSet, Guava is used only as a convenience factory method here. See the docs: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Sets.html#newHashSet(E...)
import com.google.common.collect.Maps;
import com.google.common.collect.Sets;
import net.razorvine.pickle.Unpickler;
import net.razorvine.pickle.Pickler;
import java.io.IOException;
import java.util.*;
class PickleTest {
public static void main(String[] args) throws IOException
{
Set<String> set1 = Sets.newHashSet("value1", "value2");
Map<String, Object> map1 = Maps.newHashMap();
map1.put("pkey", 1);
map1.put("s", set1);
Set<String> set2 = Sets.newHashSet("value1", "value2");
Map<String, Object> map2 = Maps.newHashMap();
map2.put("pkey", 2);
map2.put("s", set2);
Object obj = new Object[]{ map1, map2 };
Pickler pickler = new Pickler();
byte[] array = pickler.dumps(obj);
Unpickler unpickler = new Unpickler();
Object[] result = (Object[]) unpickler.loads(array);
System.out.println(((HashMap<String, Object>) result[0]).get("s").getClass());
System.out.println(((HashMap<String, Object>) result[1]).get("s").getClass());
}
}
Right, thanks for that code. I can now reproduce the problem. Earlier, I didn't look too carefully at it and assumed it had something to do with the serpent serializer which is the default, but, you are using pickle explicitly.
The problem is with the memoization code. If you construct a pickler without memoization, the result is correct. With memoization (default=on) it somehow produces a faulty pickle. I wrote the two resulting pickles to a file and loaded them in python itself:
>>> import pickle
>>> pickle.load(open("withoutmemo.pickle","rb"))
({'s': {'value1', 'value2'}, 'pkey': 1}, {'s': {'value1', 'value2'}, 'pkey': 2})
>>> pickle.load(open("withmemo.pickle","rb"))
({'s': {'value1', 'value2'}, 'pkey': 1}, {'s': ['value2', 'value1'], 'pkey': 2})
will investigate what goes wrong in the memoization code.
In the meantime, use: new Pickler(false) to switch off the memoization.
The example code can be simplified even more, pickling the following already replicates the problem:
Set<String> set1 = new HashSet();
Set<String> set2 = new HashSet();
Object obj = new Object[] {set1, set2};
Thanks, that's a useful workaround. Does turning off memoization have severe performance impact?
That totally depends on the structure of the data. If you have a lot of objects that reoccur multiple times in the data you're serializing, then yes-- it will cause a lot of object duplication on the deserialization side.
I'm fixing the issue as we speak though, so you won't have to use the workaround for long. Need to translate the fix to the C# implementation still.