tealdeer-rs / tealdeer

A very fast implementation of tldr in Rust.
https://tealdeer-rs.github.io/tealdeer/
Apache License 2.0
4.17k stars 123 forks source link

Skip line without allocation #314

Closed qknogxxb closed 1 year ago

qknogxxb commented 1 year ago

@dbrgn Your summary is very professional.

@niklasmohrin It's very kind of you to stroke so many times, I received your kindness and encouragement. This is my first PR in github.

niklasmohrin commented 1 year ago

Oh, and @dbrgn let's not forget to squash when this is done :)

qknogxxb commented 1 year ago

@niklasmohrin

Yes, I would like to do some contributions and practices. I notice that there is a test at the end of the file, which I believe it's enough to ensure the new code is correct. And what is exactly thrown to your devnull (definately a clever name) is "=========\n", the same what <Bytes as Iterator>::find skips.

    #[test]
    fn test_first_line_new_format() {
        let input = "The Title\n=========\n\n";
        let mut lines = LineIterator::new(input.as_bytes());
        let title = lines.next().unwrap();
        assert_eq!(title, LineType::Title("The Title".to_string()));
        let empty = lines.next().unwrap();
        assert_eq!(empty, LineType::Empty);
    }
dbrgn commented 1 year ago

@qknogxxb yes, I think you're right, that test should cover the functionality.

qknogxxb commented 1 year ago

@dbrgn And thank you too for inventing this convinent tool! I can say tldr prevent me from wasting at least 20 minutes per day to look up documents.

niklasmohrin commented 1 year ago

I still think the test could be improved ... :D

I mean, it tests that the = disappear, but it is not clear whether we would have a leading \n when continuing to read (which is removed by trim_end later by coincidence of the test case). Something like

let input = "\
The Title
=======
I am important!
";

could be used to ensure the reader doesn't forget leave the newline or reads too far. In general, I would want the test to give me confidence that we don't have off-by-ones like the following would:

let mut previous = 0;
if let Err(e) = Read::bytes(&mut self.reader)
    .find(|b| {
        let res = previous == b'\n';
        previous = *b.as_ref().unwrap();
        res
    })
    .transpose()

... which also looks reasonable at first glance :^)

dbrgn commented 1 year ago

Further PRs with improvements are always welcome of course 🙂

qknogxxb commented 1 year ago

@niklasmohrin You mentioned a new word "off-by-ones". I guess this means a special overflow bug which exactly overflow one byte. Specifically in your example code, the next adjacent char ("I" in "I am important") after \n will be consumed unexpectedly. I will try to improve the test on this situation.