Skipping long lines - Githubissues

First, I'd like to thank you for the work you have put into expanding codesearch.

Thanks!

One question, though: What's the reason for skipping the entire file if a long line is encountered, instead of just ignoring the line?

This is the behavior from the original codesearch. I just made it configurable. It is used as one indicator that it is most likely not a text file. From the original comment in the code:

// A file is assumed not to be text files (and thus not indexed)
// if it contains an invalid UTF-8 sequences, if it is longer than maxFileLength
// bytes, if it contains a line longer than maxLineLen bytes,
// or if it contains more than maxTextTrigrams distinct trigrams.

junkblocker / codesearch

Skipping long lines #1