Open GoogleCodeExporter opened 9 years ago
I'm maybe a scandalmonguer, but the only big structural change is the template
feature. If you process only one file, the penality is a lot bigger : 10 times
slower in my tests.
Original comment by fgalla...@gmail.com
on 3 Dec 2010 at 3:21
From Emmanuel at txt2tags-dev:
Benchmarks are also difficult :) I have only a one third penalty (from
2.6) on my tests. Which is not good but not so bad.
And a quick (git) bisect leads to the following:
About half of the increasing is added by r525 (remote files). The
remaining part culprit is not so obvious.
There's probably many culprits :)
Original comment by aureliojargas@gmail.com
on 7 Dec 2010 at 9:14
Seeing r525 now, the only significant change to the txt2tags code was:
import urllib
...
f = urllib.urlopen(file_path)
Is urllib that slow? Maybe we'll need a profile tool to point exactly where the
delay is.
Original comment by aureliojargas@gmail.com
on 7 Dec 2010 at 9:19
Python does come with a basic profiler:
http://docs.python.org/library/profile.html
I did just a quick run with it with both 2.6 and svn versions converting
sample.t2t -t html. Not at all a proper profiling test, but it gets the start
of an idea. The run shows that the biggest difference in time is in loading
additional modules that were not loaded before. (urllib, urlparse, socket,
utils, mimetools, ssl, tempfile) Just the loading of these modules took up
nearly a third of the total running time.
The next biggest difference was in the use of regular expressions. This isn't
surprising, as SVN does a lot more than 2.6 did, but it might be a place to
look at for refactoring. Perhaps making sure all regexes are moved out of
funtions that get called more than once (like the main regexes are now) so they
do not get compiled each time. Otherwise, that only differences I can see from
a cursory scan is that more functions are called, particular string and list
manipulation methods, but that is difficult to refactor away.
Original comment by jamisee...@gmail.com
on 8 Dec 2010 at 9:17
Thanks Jason,
this is really interesting. In particular the module loading penalty would
explain the gap I got at r525. But this is probably the price to pay for these
interesting new features...
Original comment by go...@laposte.net
on 9 Dec 2010 at 10:23
Thank you VERY much for all this information Jason!
I didn't know that the urllib module would carry all those extra modules. I'll
try to do the urllib import just when needed.
About regexes, from my head I don't think many of them were added since v2.6,
but I'll investigate that.
Original comment by aureliojargas@gmail.com
on 9 Dec 2010 at 10:37
Yeah, I said that about the regexes without actually checking, although I meant
it more for general refactoring and not as something specifically introduced
since 2.6. I have since checked, and only one function does this (parse_line in
ConfigLines). I factored those out to the class definition, and it sped up the
above test by ~7%. Not huge, but still an improvement.
Much bigger was loading urllib only when needed. I did that with the same
commit, and it reduced the time almost by half! Of course, if it ever needs to
get used it will slow things down, but I think that loading a remote file will
be a much bigger difference than loading the module that makes it possible. The
rest of the time, it is much better.
Original comment by jamisee...@gmail.com
on 10 Dec 2010 at 3:20
Oh, one more thing...
time for t in $targets; do txt2tags26 -q -t $t -i sample.t2t -o /tmp/x; done
real 0m10.309s
user 0m9.516s
sys 0m0.743s
# revision 606
time for t in $targets; do txt2tags -q -t $t -i sample.t2t -o /tmp/x; done
real 0m15.726s
user 0m14.616s
sys 0m1.053s
# revision 607, with speed refactoring:
time for t in $targets; do txt2tags -q -t $t -i sample.t2t -o /tmp/x; done
real 0m11.198s
user 0m10.259s
sys 0m0.880s
So I'm gonna guess that urllib was the biggest culprit for the slowdown.
Original comment by jamisee...@gmail.com
on 10 Dec 2010 at 3:41
Well done Aurélio and Jason ! And apology to Emmanuel for my wrong suspicion
about the template patch.
Original comment by fgalla...@gmail.com
on 10 Dec 2010 at 6:25
Don't worry Florent, bisect made me sleep ok anyway.
By the way, I have some problem with the test suite. Is it my checkout or do
you guys have also problems testing r607 with module headers?
Original comment by go...@laposte.net
on 10 Dec 2010 at 7:27
Emmanuel, even if your template patch was the culprint, there's no problem!
We're here to improve and fix txt2tags ad infinitum :) Maybe my initial comment
was too harsh, but that wasn't the intention, sorry.
Jason, thank you VERY much for your patch, bringing txt2tags back to the fast
lane! Just like Emmanuel, now I'll sleep ok too :)
Now talking about r607:
Newbie question: there's performance gain in doing "from foo import bar; bar()"
instead "import bar; foo.bar()"?
I agree with you than when using remote files, the module loading time will be
a minor issue since loading the file via http is already slow. Your solution is
perfect.
I liked the moving of regexes to the class definition and posterior checking
for target/keyword. It made the code more readable, thanks! I just made two
fixes in r608, and now the test-suite is 100% again.
Friendly reminder to all: always run dist/check-code.sh and test/run.py before
a commit :)
Original comment by aureliojargas@gmail.com
on 10 Dec 2010 at 12:27
In r609 the module 'email' was also postponed, giving us minor performance
gains.
It's just used in a specific situation: using %%mtime inside a remote http file.
Original comment by aureliojargas@gmail.com
on 10 Dec 2010 at 12:43
When I comment out all calls to Debug() performance improves quite a bit for
the trunk version (r1096):
# With Debug()
time ./txt2tags -i userguide.t2t -o userguide.html -t xhtml
real 0m0.713s
user 0m0.680s
sys 0m0.028s
# Without Debug()
time ./txt2tags -i userguide.t2t -o userguide.html -t xhtml
real 0m0.643s
user 0m0.616s
sys 0m0.024s
This change almost brings the trunk version up to par with version 2.6 (r382):
time ./txt2tags -i userguide.t2t -o userguide.html -t xhtml
real 0m0.620s
user 0m0.596s
sys 0m0.020s
Do you think this warrants the removal of the Debug() calls?
Original comment by jendriks...@gmail.com
on 30 Jan 2013 at 12:28
BTW: A good way to search for culprits that are responsible for the code
slowdown is the following:
python -m cProfile -o t2t.prof txt2tags -i userguide.t2t -o userguide.html -t
xhtml
runsnake t2t.prof
Original comment by jendriks...@gmail.com
on 30 Jan 2013 at 12:30
Original issue reported on code.google.com by
aureliojargas@gmail.com
on 3 Dec 2010 at 2:48