nathanmarz / dfs-datastores

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.
BSD 3-Clause "New" or "Revised" License
215 stars 82 forks source link

Problems consolidating HDFS pail #12

Open derrickburns opened 11 years ago

derrickburns commented 11 years ago

I have some code that creates a pail (with multiple attributes) using a PailTap. Then, I would like to consolidate the Pail. Well, if the Pail is created on a local or S3 file system, calls to consolidate work. However, if the pail is created on an HDFS file system, then it fails. Here is the error message:

2013-02-02 02:48:58,472 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-02-02 02:48:58,509 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2013-02-02 02:48:58,509 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation 2013-02-02 02:48:58,512 WARN org.apache.hadoop.mapred.Child (main): Error running child java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: cascading.flow.FlowException: internal error during mapper configuration at cascading.flow.hadoop.FlowMapper.configure(FlowMapper.java:99) ... 14 more Caused by: java.lang.NullPointerException at cascading.flow.hadoop.util.HadoopUtil.readStateFromDistCache(HadoopUtil.java:422) at cascading.flow.hadoop.FlowMapper.configure(FlowMapper.java:78) ... 14 more 2013-02-02 02:48:58,517 INFO org.apache.hadoop.mapred.Task (main): Runnning cleanup for the task

sritchie commented 11 years ago

Hey @derrickburns, what was the resolution here?

derrickburns commented 11 years ago

My bug! I didn't pass the proper path to getFileSystem. — Sent from Mailbox for iPhone

On Mon, Feb 25, 2013 at 12:06 PM, Sam Ritchie notifications@github.com wrote:

Hey @derrickburns, what was the resolution here?

Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/dfs-datastores/issues/12#issuecomment-14072164

derrickburns commented 11 years ago

Actually, I am seeing this again. I think it has to do with issues with the JobConf size. What are the limitations? Could too many long pathnames be the issue?

derrickburns commented 11 years ago

In another case, getLocalCacheFiles is returning null in cascading / cascading-hadoop / src / main / java / cascading / flow / hadoop / util / HadoopUtil.java

public static String readStateFromDistCache( JobConf jobConf, String id ) throws IOException
    {
    Path[] files = DistributedCache.getLocalCacheFiles( jobConf );
derrickburns commented 11 years ago

Could the length of the JobName be an issue? Are there limits?

fkautz commented 11 years ago

This appears for me when I try to consolidate after running a cascading job. If I consolidate without invoking cascading before, it works.

// -- cascading job, using PailTap (not sure if that makes a difference)
connect.complete();
Pail masterPail = Pail.create("pail", new TestStructure(), false);
tmpPail.consolidate(); // errors, 

vs

Pail masterPail = Pail.create("pail", new TestStructure(), false);
tmpPail.consolidate(); // works, 

cascading-hadoop#2.1.2 -- cascading.flow.hadoop.util.HadoopUtil -- Line 418 Path[] files = DistributedCache.getLocalCacheFiles( jobConf ); // returns null

I'm not entirely sure why cascading is being invoked here, the MR job appears to have nothing to do with cascading. Maybe some state wasn't cleaned up properly.