junkblocker / codesearch

Fork of Google codesearch with more options
BSD 3-Clause "New" or "Revised" License
46 stars 12 forks source link

Skipping long lines #1

Closed alexrkopp-xx closed 7 years ago

alexrkopp-xx commented 7 years ago

First, I'd like to thank you for the work you have put into expanding codesearch.

One question, though: What's the reason for skipping the entire file if a long line is encountered, instead of just ignoring the line?

junkblocker commented 7 years ago

First, I'd like to thank you for the work you have put into expanding codesearch.

Thanks!

One question, though: What's the reason for skipping the entire file if a long line is encountered, instead of just ignoring the line?

This is the behavior from the original codesearch. I just made it configurable. It is used as one indicator that it is most likely not a text file. From the original comment in the code:

// A file is assumed not to be text files (and thus not indexed)
// if it contains an invalid UTF-8 sequences, if it is longer than maxFileLength
// bytes, if it contains a line longer than maxLineLen bytes,
// or if it contains more than maxTextTrigrams distinct trigrams.