Open jendap opened 12 years ago
Hey jendap, thanks for taking a look at HDFS-DU!
The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.
Can you just uncomment the line and set to whatever is an appropriate path for your environment?
Hm. We could have a parametrized register, with a default value. The unit test would be able to reset that value, and we could tell users to set it I they move the jar -- that way the script doesn't need modification.
On Aug 31, 2012, at 8:41 AM, Travis Crawford notifications@github.com wrote:
Hey jendap, thanks for taking a look at HDFS-DU!
The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.
Can you just uncomment the line and set to whatever is an appropriate path for your environment?
— Reply to this email directly or view it on GitHub.
Since the unit test has the UDF class already on the classpath the register is not needed.
Any clue how Pig behaves if registering either a fake path, or no path at all?
You can register '/dev/null', seems to work ok :).
D
On Fri, Aug 31, 2012 at 8:49 AM, Travis Crawford notifications@github.comwrote:
Since the unit test has the UDF class already on the classpath the register is not needed.
Any clue how Pig behaves if registering either a fake path, or no path at all?
— Reply to this email directly or view it on GitHubhttps://github.com/twitter/hdfs-du/issues/4#issuecomment-8196264.
This fails with ERROR 4002: Can't read file: /doesnotexist.pig
register /doesnotexist.pig;
a = load '/etc/hosts' using PigStorage();
dump a;
For now I'd like to keep this as-is, and ask users to uncomment that line, setting to whereever they put the UDF jar.
What would be an awesome pull request is removing the need for this Pig script entirely, instead adding an OfflineImageViewer-based tool that generates the dataset directly. This pig script was super useful in development when we didn't know what data we needed, but now that we know what dataset to produce we could simply dump it directly when parsing the fsimage.
Thoughts?
Otherwise the ExtractSizes udf function is not found / resolved.