ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

What if the data generated is really big? #20

Closed tomersagi closed 9 years ago

tomersagi commented 10 years ago

Hi, I want to generate SF 8000 in the near future. It seems strange to me to run the hadoop job, get 8TB of data which are moved to another 8TB of csv files which are then loaded to >8TB of DB storage. That means that to run an 8TB workload I need at least 16 TB and probably more. Are there any plans for an interactive version where the data is loaded to the DB as it is generated? Thanks, Tomer

ArnauPrat commented 10 years ago

Hi Tomer, this is definetely something we are going to implement or allow the users to implement their own serializers, for instance, one that directly insterts data into the databas.

Regards,

Arnau El 18/11/2014 12:47, "Tomer Sagi" notifications@github.com escribió:

Hi, I want to generate SF 8000 in the near future. It seems strange to me to run the hadoop job, get 8TB of data which are moved to another 8TB of csv files which are then loaded to >8TB of DB storage. That means that to run an 8TB workload I need at least 16 TB and probably more. Are there any plans for an interactive version where the data is loaded to the DB as it is generated? Thanks, Tomer

— Reply to this email directly or view it on GitHub https://github.com/ldbc/ldbc_snb_datagen/issues/20.

tomersagi commented 10 years ago

Any timeline on that? I will need it pretty soon...

ArnauPrat commented 10 years ago

We dont know. We are currently working to ensure that datagen is able to generate graphs up to 500+ billion edges. Once this is done, I'll work on your request, but I expect about one month. If this is very important for you, you can always download the code and modify it. You should basically implement the Serializer interface.

Regards,

Arnau El 18/11/2014 13:20, "Tomer Sagi" notifications@github.com escribió:

Any timeline on that? I will need it pretty soon...

— Reply to this email directly or view it on GitHub https://github.com/ldbc/ldbc_snb_datagen/issues/20#issuecomment-63461944 .

tomersagi commented 10 years ago

o.k. please keep updating on this. Thanks

tomersagi commented 9 years ago

This has been resolved with the introduction of custom serializers.