rkarivuraj / coderev

Automatically exported from code.google.com/p/coderev
GNU General Public License v2.0
0 stars 0 forks source link

Infinite loop with big file (over 1 Mo) #9

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run codif with
$> codediff.py ALBONLV2.csv ALBONLV2.txt -o diff.html

What is the expected output? What do you see instead?
should generate a html diff file. infinite loop, never end.

What version of the product are you using? On what operating system?
0.3.3

Please provide any additional information below.

just run it

Original issue reported on code.google.com by regis.le...@gmail.com on 13 Nov 2012 at 8:05

Attachments:

GoogleCodeExporter commented 9 years ago
the files contain over 23000 lines

Original comment by regis.le...@gmail.com on 13 Nov 2012 at 8:38

GoogleCodeExporter commented 9 years ago
Problem reproduced, seems to be a bug inside python difflib, will try to figure 
out what is it.  Btw 1+ MB is not big.

Original comment by matt...@gmail.com on 14 Nov 2012 at 3:23

GoogleCodeExporter commented 9 years ago

Original comment by matt...@gmail.com on 14 Nov 2012 at 3:24

GoogleCodeExporter commented 9 years ago
I dived into the python difflib.py, the problem resides in one function of the 
class Differ:

 911     def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 ...
 939         for j in xrange(blo, bhi):
 940             bj = b[j]
 941             cruncher.set_seq2(bj)
 942             for i in xrange(alo, ahi):
 ...

The two level loops listed above is extremely inefficient for your case because 
the two input files diff in each line, I don't really understand the logic but 
the loop does seem run into infinity.  You can easily reproduce with the 
example script listed in python doc: 
http://docs.python.org/2/library/difflib.html#a-command-line-interface-to-diffli
b (use option '-n').

Options:

1. Report bug to python difflib
2. Do not use this tool for files like yours (differ in every line)
3. Ignore blanks when comparing (not sure difflib has the ability)

Original comment by matt...@gmail.com on 15 Nov 2012 at 2:24

GoogleCodeExporter commented 9 years ago
We had the same problems with large files.
But the patch "11740.patch" mentioned here http://bugs.python.org/issue6931 
solved our problem. Now even several 25+mb files were diffed in one run with 
runtime <2min :)

Original comment by itserviceokamzol on 9 Apr 2013 at 1:04

GoogleCodeExporter commented 9 years ago
Thanks for the information, the issus is still open :(

Original comment by matt...@gmail.com on 9 Apr 2013 at 1:45