gregrahn / tpcds-kit

TPC-DS benchmark kit with some modifications/fixes
317 stars 200 forks source link

Re-enable printing to stdout #29

Open ianbuss opened 7 years ago

gregrahn commented 7 years ago

It is more complicated than this to get this working correctly. The challenge is that the sales/returns tables are generated in pairs and there is no way to generate only one of them. I've recently been using https://github.com/teradata/tpcds because it is much faster than the TPC version written in C (surprising, I know).

ianbuss commented 7 years ago

Yes, I resorted to the rather ugly approach of relying on the fact that child tables all have a different number of fields for now, which is just nasty. Going straight from dsdgen to Parquet using Spark which takes away the requirement for passwordless SSH for distribution etc, but I will definitely check out the link, thanks.