ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

Data generator generates empty files #22

Closed tomersagi closed 9 years ago

tomersagi commented 9 years ago

Hi, I ran the latest version of the data generator on SF1 and got a list of person_x.csv files. person_0.csv has 1000KB data, person_1.csv has 10KB data and person_2.csv - person_31.csv have 75 Bytes which is just the header of the file. Same phenomenon in other filesets. Overall data size is good (~1GB), but why the extra files? Thanks, Tomer

ArnauPrat commented 9 years ago

Hi @tomersagi, The reason why datagen currently generates empty files is because it generates data in blocks of 10K users. If numPersons < 10K*(num_threads-1), there will be reducers that won't receive any block of data to process, and therefore, their output files are empty. The dataset is perfectly valid anyways. Is this a problem for you? One possibility to solve this is just checking whether the avobe inequality holds for a given number of threads and users, and adjust the number of threads accordingly automatically not to spawn threads that are unnecessary.

Best wishes and happy new year :) Arnau

tomersagi commented 9 years ago

Not a problem, just a nuisance and some clutter in the FS. When you get around to it. Happy new year. Tomer

tomersagi commented 9 years ago

Resolved in latest version.