pest-parser / pest

The Elegant Parser
https://pest.rs
Apache License 2.0
4.67k stars 261 forks source link

Suggestion: Expose a getter function for LineIndex #981

Open BoolPurist opened 9 months ago

BoolPurist commented 9 months ago

What is the Idea

Pairs has a readonly shared LineIndex field. I would like to have access to that as an user of this library. I would suggest a getter function to that field like

pub fn line_index(&self) -> Rc<LineIndex> {
     // .. ....
}

Are open to this idea ?

pub struct Pairs<'i, R> {
    queue: Rc<Vec<QueueableToken<'i, R>>>,
    input: &'i str,
    start: usize,
    end: usize,
    pairs_count: usize,
    line_index: Rc<LineIndex>, // <= Want that ^^
}

Reason

I develop a parser for the scripting language of Ion shell. I currently just save at which byte a token starts and end. However for a LSP server using this parser, I need to map these bytes to their according line and column number for the LSP protocol. I currently get the line and column number by using a HashMap whose keys are bytes mapping to its respective column and line number. Since the pest parser has already done the work, I could use this line index information instead.

tomtau commented 9 months ago

LineIndex is internal only, I think (it was done differently before, so it's just an implementation detail)... maybe just returning the offsets &[usize] from that accessor function would be better? Or are you specifically interested in LineIndex for the line_col helper function?

BoolPurist commented 9 months ago

I prefere the latter one, "line_col" function from LineIndex https://github.com/pest-parser/pest/blob/master/pest/src/iterators/line_index.rs#L36 .

Just getting the offsets &[usize] would be appreaciated though. However then I need the columns numbers too. I still would have to implement the logic for getting the column number . This library already solves this problem too, Why not provide to user too ?

I know there is a way to access the line and column number on a returned token. However that requires one to keep every pair struct around for getting the line and column number from a byte.

I think it would be nice to get a struct which would provide this function "line_col ". This pub struct could be a different one than the LineIndex to hide this detail. Since the LineIndex is already in a Rc, It would not be hard to provide this new pub struct with this inner hidden detail.

Here draft of an API, I would find useful on this pub struct.

pub fn line_col(&self, input: &str, pos: usize) -> (usize, usize) {
   ...
}

pub fn line(&self, input: &str, pos: usize) -> usize {
  ...
}
tomtau commented 9 months ago

I recall @huacnlee was using this line/col information before, so may have an opinion or suggestion for this. In principle, it should be ok as long as the API's returned types wouldn't limit changes in the internal implementation.