Closed tomersagi closed 9 years ago
Hi @tomersagi, The reason why datagen currently generates empty files is because it generates data in blocks of 10K users. If numPersons < 10K*(num_threads-1), there will be reducers that won't receive any block of data to process, and therefore, their output files are empty. The dataset is perfectly valid anyways. Is this a problem for you? One possibility to solve this is just checking whether the avobe inequality holds for a given number of threads and users, and adjust the number of threads accordingly automatically not to spawn threads that are unnecessary.
Best wishes and happy new year :) Arnau
Not a problem, just a nuisance and some clutter in the FS. When you get around to it. Happy new year. Tomer
Resolved in latest version.
Hi, I ran the latest version of the data generator on SF1 and got a list of person_x.csv files. person_0.csv has 1000KB data, person_1.csv has 10KB data and person_2.csv - person_31.csv have 75 Bytes which is just the header of the file. Same phenomenon in other filesets. Overall data size is good (~1GB), but why the extra files? Thanks, Tomer