Open GoogleCodeExporter opened 9 years ago
the files contain over 23000 lines
Original comment by regis.le...@gmail.com
on 13 Nov 2012 at 8:38
Problem reproduced, seems to be a bug inside python difflib, will try to figure
out what is it. Btw 1+ MB is not big.
Original comment by matt...@gmail.com
on 14 Nov 2012 at 3:23
Original comment by matt...@gmail.com
on 14 Nov 2012 at 3:24
I dived into the python difflib.py, the problem resides in one function of the
class Differ:
911 def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
...
939 for j in xrange(blo, bhi):
940 bj = b[j]
941 cruncher.set_seq2(bj)
942 for i in xrange(alo, ahi):
...
The two level loops listed above is extremely inefficient for your case because
the two input files diff in each line, I don't really understand the logic but
the loop does seem run into infinity. You can easily reproduce with the
example script listed in python doc:
http://docs.python.org/2/library/difflib.html#a-command-line-interface-to-diffli
b (use option '-n').
Options:
1. Report bug to python difflib
2. Do not use this tool for files like yours (differ in every line)
3. Ignore blanks when comparing (not sure difflib has the ability)
Original comment by matt...@gmail.com
on 15 Nov 2012 at 2:24
We had the same problems with large files.
But the patch "11740.patch" mentioned here http://bugs.python.org/issue6931
solved our problem. Now even several 25+mb files were diffed in one run with
runtime <2min :)
Original comment by itserviceokamzol
on 9 Apr 2013 at 1:04
Thanks for the information, the issus is still open :(
Original comment by matt...@gmail.com
on 9 Apr 2013 at 1:45
Original issue reported on code.google.com by
regis.le...@gmail.com
on 13 Nov 2012 at 8:05Attachments: