nathanmarz / dfs-datastores

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.
BSD 3-Clause "New" or "Revised" License
215 stars 82 forks source link

Problem with pail consolidation on HDFS #49

Open juliankeppel opened 9 years ago

juliankeppel commented 9 years ago

I'm reading "Big Data" very interested and are trying to implement the batch layer with graph data model as master dataset.

I always get a NullPointerException when I call the consolidate method on a pail.

For example, I want to absorb a pail with new incoming data from the pail containing my master dataset. Then I want to consolidate the master pail to avoid lots of small files.

Example code snippet:

TypedRecordOutputStream dataOutputStream = incomingPail.openWrite();
for (Data thriftData : objects) { dataOutputStream.writeObject(thriftData); } dataOutputStream.close();

masterPail.absorb(incomingPail); masterPail.consolidate();

I know that this is the "stupid" approach to ingest new data to the master dataset (without snapshot or the like). But for the moment this is a sufficient solution for me.

Thanks for your help.

Error:

Exception in thread "main" java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:482) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.util.Shell.execCommand(Shell.java:808) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:656) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:444) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:293) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:437) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at com.backtype.hadoop.Consolidator.consolidate(Consolidator.java:106) at com.backtype.hadoop.pail.Pail.consolidate(Pail.java:538) at com.backtype.hadoop.pail.Pail.consolidate(Pail.java:509) at hdfsconnector.PailHDFSConnector.appendMasterDataset(PailHDFSConnector.java:107) at LambdaStarter.main(LambdaStarter.java:40)

Bruce-Du commented 8 years ago

I met the same error with you,did you fix it now?

juliankeppel commented 8 years ago

To be honest: I didn't follow up pail for this project. I implemented the data model with Hive tables and a Data Vault-like approach at the end. Am 30.01.2016 4:26 nachm. schrieb "DataNerd" notifications@github.com:

I met the same error with you,did you fix it now?

— Reply to this email directly or view it on GitHub https://github.com/nathanmarz/dfs-datastores/issues/49#issuecomment-177210092 .