ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

Python parameter gen script complains about encoding #23

Closed tomersagi closed 9 years ago

tomersagi commented 9 years ago

Hi, I installed the data generator to a new system and now I'm getting this error:

loading input for parameter generation
Traceback (most recent call last):
File "paramgenerator/generateparams.py", line 281, in <module>
sys.exit(main())
File "paramgenerator/generateparams.py", line 145, in main
(personFactors, countryFactors, tagFactors, tagClassFactors, nameFactors, givenNames,  ts) = readfactors.load(factorFiles, friendsFiles)
File "/local/datagen/ldbc_snb_datagen/paramgenerator/readfactors.py", line 72, in load
line = f.readline().split(",")
File "/usr/lib64/python2.7/codecs.py", line 675, in readline
return self.reader.readline(size)
File "/usr/lib64/python2.7/codecs.py", line 530, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib64/python2.7/codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: invalid continuation byte

This may be related to #21 Thanks Tomer

tomersagi commented 9 years ago

Update: I reassembled the jar with utf8 encoding using instructions from @alexaverbuch . (BTW, I couldn't push my branch and generate a pull request due to insufficient privileges.) Then I deleted all results from the previous run and re-ran run.sh. Same error.

ArnauPrat commented 9 years ago

Maybe the cause of the problem could be that the data used by parameter generation is not exported correclty in utf8. Maybe @agubichev can give us a hand with this.

Arnau

tomersagi commented 9 years ago

Issue was resolved in latest versoin