Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k
stars
2.39k
forks
source link
`JobTask.dump` raises `TypeError` when replacing bytes to string #3284
Then, d.replace(b'(c__main__', "(c" + module_name) always return TypeError: a bytes-like object is required, not 'str' because its replacement should be any bytes-like object (https://docs.python.org/3/library/stdtypes.html#bytes.replace).
It is simplified code of this situation:
from luigi.contrib.hadoop import JobTask
class My(JobTask):
pass
a = My()
a.dump('my')
Output :
Traceback (most recent call last):
File "/home/wonseok/current/luigi/my_test.py", line 8, in <module>
a.dump('my')
File "/home/wonseok/current/luigi/luigi/contrib/hadoop.py", line 974, in dump
d = d.replace(b'(c__main__', "(c" + module_name)
TypeError: a bytes-like object is required, not 'str'
It is related issue #2402
If I'm mistaken, I'd appreciate it if you could let me know.
Thank you!
In the file
luigi.contrib.hadoop.py
, theJobTask.dump
method shows a weird behavior.https://github.com/spotify/luigi/blob/64d6c487c49548a5b97cc3ac6e0890f89d7dccd2/luigi/contrib/hadoop.py#L965-L978
I believe the variable
d
is bytes type bypickle.dumps
(https://docs.python.org/3/library/pickle.html#pickle.dumps), and the variable module_name should be string type becausesys.argv[0]
is string type (https://docs.python.org/3/library/sys.html#sys.argv).Then,
d.replace(b'(c__main__', "(c" + module_name)
always returnTypeError: a bytes-like object is required, not 'str'
because its replacement should be any bytes-like object (https://docs.python.org/3/library/stdtypes.html#bytes.replace). It is simplified code of this situation:Output :
It is related issue #2402 If I'm mistaken, I'd appreciate it if you could let me know. Thank you!