Request for info - Githubissues

vijaynaidu commented 7 years ago

Hi @kschiess Thanks for the cool plugin. I'm trying to understand how actually to use Parslet for my case. Can you help me with the idea/ syntax on the following cases.

CASE1: Input:

"Band structure "a,b" of graphite,"

Expected output:

{
  :pre=>"\"", 
  :content=> "Band structure \"a,b\" of graphite", 
  :post=> ",\""
}

CASE2:

Input: pp 211–220,.

Expected output:

{
  :pre=>'pp ',
  :firstpage=>211,
  :sep=>'–',
  :lastpage=>220,
  :post=>',.'
}

Thanks

kschiess commented 7 years ago

This is not how it works. This is open source. In general, when you ask for help, I would expect that you show me the code that you've written (your attempt) and tell me what you expect. I'd then point out where we have different assumptions.

I currently don't have the time to help you with this. Maybe try Stack Overflow? Also: The examples directory here has quite a bit of parslet code for you to peruse. Good luck!

vijaynaidu commented 7 years ago

@kschiess Sorry, i apologise for my mistake :( :+1: Thanks for your reply. Sure, would try to get help from other sources

Hope someone might get helpful from this piece that i tried I'm applying Parslet for parsing text and it works cool for segmenting page nos i.e CASE 2. But no idea on how to do the same with CASE 1 i.e parsing title

page = 'pp. S170–S177.'

class PageParse < Parslet::Parser
    root(:page_exp)

    rule(:space) { match('\s').repeat(1) }
    rule(:space?) { space.maybe }

    rule(:dot) { str('.').repeat(1) }
    rule(:dot?) { dot.maybe }

    rule(:comma) { str(',').repeat(1) }
    rule(:comma?) { comma.maybe }

    rule(:alphabet) { match('[A-Za-z]').repeat(1) }
    rule(:alphabet?) { alphabet.maybe }

    rule(:integer) { match('[0-9]').repeat(1) }
    rule(:integer?) { integer.maybe }

    rule(:alpha_numeric) { (alphabet | integer).repeat(1) }
    rule(:alpha_numeric?) { alpha_numeric.maybe }

    rule(:page_label_names){ str('page') | str('pp') | str('p') }
        rule(:page_label_names?){ page_label_names.maybe }

    rule(:page_label){ space? >> page_label_names? >> dot? >> space? }
    rule(:page_end_boundary){  space? >> comma? >> dot? >> space? }
    rule(:page_end_boundary?){  page_end_boundary.maybe }

    rule(:page_no){ alpha_numeric }
        rule(:page_no?){ page_no.maybe }

    rule(:page_seperator){ str('-').repeat(1) | str('–').repeat(1) }
        rule(:page_seperator?){ page_seperator.maybe }

    rule(:page_content){ page_no?.as(:first_page) >> page_seperator?.as(:separator) >> page_no?.as(:last_page) }

    rule(:page_exp){ page_label.maybe.as(:match_pre) >> page_content.maybe.as(:pages) >> page_end_boundary.as(:match_post) }
end

def parse(page)
    PageParse.new.parse(page)
rescue Parslet::ParseFailed => failure
    #return page
    puts failure.parse_failure_cause.ascii_tree
end

pp parse(page)

kschiess commented 7 years ago

Hi,

I've taken a quick look after all. I have a hard time understanding the syntax that underlies this 'title' thing. Apparently, it nests '"' without escaping, so a parser would have to keep reading balanced '"' until it finds the last one in the document? Maybe your difficulty in parsing this comes from the underlying grammar being underdefined.

Maybe that helps? kaspar

kschiess / parslet

Request for info #178