trinodb / tpch

Port of TPC-H dbgen to Java
44 stars 45 forks source link

How to use airlift-tpch #3

Open prashant23 opened 10 years ago

prashant23 commented 10 years ago

Hello developers, I was searching for java utility to generate TPCH data and found your code, Can you tell me how to use this as a API, I was looking for Readme file but i didn't found one.

Thanks and Regards

Prashant

dain commented 10 years ago

I haven't gotten around to adding a command line interface yet, but you can create files with code like the following:

Writer writer = new FileWriter("yourFile");
for (Customer entity : new CustomerGenerator(scaleFactor, part, numberOfParts)) {
    writer.write(entity.toLine());
    writer.write('\n');
}

Each table in TPCH has an associated generator, and each generator is an Iterable<TpchEntity>. Each entity has getters for the individual column values, or you can use the toLine() to generate a standard TPCH output line.

prashant23 commented 10 years ago

when I am trying to run the above code , I am getting the following error -

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Unknown Source) at java.lang.String.(Unknown Source) at java.lang.StringBuilder.toString(Unknown Source) at io.airlift.tpch.TextPool.(TextPool.java:62) at io.airlift.tpch.TextPool.(TextPool.java:40) at io.airlift.tpch.TextPool.getDefaultTestPool(TextPool.java:31) at io.airlift.tpch.CustomerGenerator.(CustomerGenerator.java:44) at tpch.data.GenerateData.main(GenerateData.java:14)

I think it's creating ample amount of garbage collector . I am using windows 7 with RAM of 4GB.

Is there any problem with the RAM ?

Kindly let me know if any workaround is there ?

dain commented 10 years ago

This code has not been optimized for running in memory constrained environments. I'm sure there is a lot of room for improvement here if you want to take a look at it. Also the latest commits in trunk improve performance and rate of garbage generation, but I'm not sure what the minimum amount of memory to required to run the generator is. I would guess you need at least a few GBs.

electrum commented 10 years ago

I don't know how the JVM chooses the default heap size on Windows, but it might be too small. Try increasing the heap size when running Java:

java -Xms2G -Xmx2G ...

This sets the starting size and maximum size to 2GB, so it will allocate that much memory up front and use a fixed-size heap. You can try using 1G or 3G depending on whether or not that works.