Closed aripollak closed 11 years ago
I'm afraid I'm not following completely here. When typedbytes is not installed as an egg, then the old code will revert to opts.add('file', modpath)
which will make sure the .py file is send along and thus available on HDFS, right? Not sure what's left to fix then...
Unfortunately I forgot exactly what was happening, but I definitely tested dumbo after installing typedbytes through pip, and it didn't work with the original code but it worked with this change. I think the problem might have been that opts['file'] would be re-interpreted by the code starting at line 176. The module path didn't start with file://, so it wasn't actually getting passed to streaming as a -file. But if you add it as a libegg, it does get sent along with the job.
typedbytes is installed on every node in my cluster, I don't need or want it distributed to HDFS. Does this patch remove that "feature"?
Think this is a better solution:
https://github.com/klbostee/dumbo/commit/b67a7b1dfeaa7df8fa98a6cbb550a7fec201fa4d
Thanks!
The existing condition didn't make sense if typedbytes was not installed as an egg, since it would make Hadoop think the typedbytes module was on HDFS. The new method is the same as what's in backends/common.