Output has additional 0x0D

pydoit / doit

CLI task management & automation tool

http://pydoit.org

MIT License

1.85k stars 175 forks source link

Output has additional 0x0D #179

Open redihokuto opened 7 years ago

redihokuto commented 7 years ago

I'm facing the following issue. I have used DoIt to replace makefile for software projects. I have DoIt calling the compiler. I have noticed a slightly different output if I compare the DoIt output with a direct call to the compiler from the command-line. Visually the DoIt output has extra lines around stderr result. I digged a little more to understand it. As result I found the DoIt ouput (that is showing actually the stderr from the compiler) has extra 0x0D chars I have attached the 2 result (caught by redirecting command-line) compiler-direct-call.txt compiler-through-doit.txt On the console the extra 0x0D does not produce visible effect, but if you compare them binary you'll see it.

Thanks for this great tool! Fabrizio.

schettino72 commented 7 years ago

Please add a simple example to reproduce the problem. I guess it is because python re-interpret the output and adds an extra 0xd0...

This link explains the problem. https://wiki.python.org/moin/PythonDevWisdom It says the workaround is to open the stream as a binary stream, but not sure it will have other consequences...

Can you configure your compiler to output unix-style line termination? Or any other idea?

redihokuto commented 7 years ago

Thank you for the prompt answer. It's difficult for me to create a script to re-produce the problem since you should have the compiler. My investigation brought me on the "_print_process_output", that I guess, is responsible for catching the process output and print it. According to the link you provided me, the read get the "\r\n" as output by the compiler (0x0d 0x0a) Then, the write do the trick of converting the \n into \r\n (os.linesep) and as final result we have \r\r\n (0x0d 0x0d 0x0a). I'm almost sure this is the problem. I would open the file as binary when "self.buffering" (output "generic"), and as text when it is "not self.buffering" (it means the output is expected to be a text with line-endings) On the readline(), remove the line endings "\r\n" and append a simple "\n". Just what came up into my mind... what do you think?

redihokuto commented 7 years ago

What I found today is the following: function _print_process_output self.buffering is 0 We have read = lambda: input_.readline() The doc says that readline should leave a \n (only, not \r\n). It happens that lines read by line = read().decode(self.encoding, self.decode_error) have at the end 0x0d 0x0a found by adding far-from-elegant

with open("xxx", "ta") as xxx:
   print( ",".join("{:02x}".format(ord(c)) for c in line), file=xxx )

Later the write() will replace \n with 0x0d 0x0a, resulting in a total 0x0d 0x0d 0x0a

Hope this can shed light on it...

redihokuto commented 7 years ago

I found the time to prepare an example I put here the 2 files needed: "dodo.py" and "c.bat" dodo.py

def task_hello():

    return {
        'actions': ['c.bat alpha.txt', 'c.bat beta.txt', ],
        'targets': ['alpha.txt', 'beta.txt'],
        }

c.bat

@echo Compile %1
@echo>%1 Content

The file "c.bat" output a line on the stdout and write a file Now execute it with

doit --verbosity 2 >a

Open the file "a" that is produced and check line endings (in an hex editor), they are 0x0d 0x0d 0x0a In the command-line the extra 0x0d does not produce any visible effect, but if the output is capture by an editor (as I do with Sublime Text) you'll see an extra empty line.

Thanks!