prabhatbhattarai / project-voldemort

Automatically exported from code.google.com/p/project-voldemort
Apache License 2.0
0 stars 0 forks source link

.jdb file sizes are not stable after processing using Hadoop and AdminClient code #344

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. I initialized dataloading through Hadoop where record sizes is about 60 
million. After processing through Hadoop, all jdb files are in size 88 GB on 
each voldemort node.
2. After data loading using Hadoop, I updated the stores in voldemort using 
AdminClient - put method.
3. Compare to Hadoop processed jdb files, the jdb files created by AdminClient 
is taking more space

I dont know, why AdminClient is taking more space compared to Hadoop processed 
jdb files. 
For eg: Initialization of 60 million records using Hadoop -> jdb file size is 
88 GB.
But After updating 2 million records using AdminClient put method, in existing 
hadoop processed jdb files in Voldemrt, the jdb file is 118 GB.

What is the expected output? What do you see instead?
Please let me know why this much big difference in this processed jdb files. Is 
Hadoop uses any other techinique to process the huge records?

What version of the product are you using? On what operating system?
Hadoop
Voldemort - 0.80.1

Please provide any additional information below.

Thanks
Anoop

Original issue reported on code.google.com by r.anoopr...@gmail.com on 18 Apr 2011 at 7:03

GoogleCodeExporter commented 8 years ago
Hi Anoop,

What exactly are you trying to do? If you're looking to build data via Hadoop, 
use the read only storage engine which is far more optimized for this. 
BerkeleyDB is a log structured, read-write modifiable store, which isn't as 
efficient as the read only format.

Thanks,
- Alex

Original comment by feinb...@gmail.com on 27 Apr 2011 at 9:00

GoogleCodeExporter commented 8 years ago
Doesn't seem like a bug but instead a doubt. Please continue the conversation 
on the mailing list.

Original comment by rsumb...@gmail.com on 27 Apr 2011 at 10:15