temoto / robotstxt

The robots.txt exclusion protocol implementation for Go language
MIT License
269 stars 55 forks source link

Tiny optimization in scanner.go #8

Closed mynameisfiber closed 11 years ago

mynameisfiber commented 11 years ago

The tokenizer was calling isSpace quite often which was using strings.Index to see if the current rune was a whitespace character or not. This involved a type cast and many string comparisons... the attached code simply keeps things in rune space (thus avoiding a memory allocation at each comparison) and is testing comparisons based on frequency of use (so the natural space is the first comparison, then the horizontal tab, then vertical).

Anyways, this small changed sped up the (*ByteScanner).Scan operation by about 30%!

temoto commented 11 years ago

That's cool, thank you very much.

By the way, if you are using (or going to) this library, perhaps, you could say your opinion on API change, as discussed in https://github.com/temoto/robotstxt.go/pull/7

mynameisfiber commented 11 years ago

@temoto No worries! Also, that pull request seems to be the perfect evolution of this library... I will take a deeper look at the actual code later in the day.

mynameisfiber commented 11 years ago

@temoto Nice benchmarks! I changed the code.. also take your time on the merge.

temoto commented 11 years ago

Thanks again and for gofmt too.