qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
132 stars 64 forks source link

Parsec combinators? #15

Open alanz opened 10 years ago

alanz commented 10 years ago

I am writing some Parsec combinators for my own use on top of this.

Do you want a pull request for them when I am done? I am not sure if they belong in this library.

qrilka commented 10 years ago

I guess that's what PRs are for and they could get accepted or not or could lead to some discussion. Could I take a look at them already now?

alanz commented 10 years ago

Its a work in progress, currently in my private project. Will generate a PR when it stabilises.

On Wed, Oct 22, 2014 at 9:38 AM, Kirill Zaborsky notifications@github.com wrote:

I guess that's what PRs are for and they could get accepted or not or could lead to some discussion. Could I take a look at them already now?

— Reply to this email directly or view it on GitHub https://github.com/qrilka/xlsx/issues/15#issuecomment-60047140.

qrilka commented 10 years ago

But probably some small gist with their API? I any case thanks.

alanz commented 10 years ago

parseSheparseSheet :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
et :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
qrilka commented 10 years ago

Github is a bit strange it did not use markdown for you email I guess, your message should look like this I think:


parseSheparseSheet :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
et :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
alanz commented 10 years ago

yep

On Wed, Oct 22, 2014 at 11:22 AM, Kirill Zaborsky notifications@github.com wrote:

Github is a bit strange it did not use markdown for you email I guess, your message should look like this I think:

parseSheparseSheet :: Worksheet -> [DetailLine2]parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3] parseRow :: Worksheet -> RowNum -> Maybe DetailLine2parseRow sh row = r debug ("parseRow:cells= " ++ show cells) where cells = map (\col -> cellsh sh (row,col)) [1..11] r = case parseDl cells of Left err -> Nothing debug ( "parseRow " ++ show row ++ ":" ++ showerr) Right dl -> Just dl -- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->Either ParseError a parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2parseDl ss = parse p "source" ss type P a = Parsec [Maybe CellValue]() a p :: P DetailLine2p = pDLHeading pDLHeading :: P DetailLine2pDLHeading = do many1 pEmpty name <- pText many1 pEmpty pLabel "Date: Statement For: " pEmpty date <- pNumber return (DLHeading name date) -- | Return the text from a cellpText :: P T.TextpText = tokenPrim show nextPos getMaybeText where nextPos pos = incSourceColumn pos 1

getMaybeText mt = case mt of
  Just (CellText str) -> Just (T.fromStrict str)
  _                   -> Nothing

-- | Return the value of a cellpNumber :: P RationalpNumber = tokenPrim show nextPos getMaybeNumber where nextPos pos = incSourceColumn pos 1

getMaybeNumber mt = case mt of
  Just (CellDouble d) -> Just (double2Rational d)
  _                   -> Nothing

-- | Parse an empty cellpEmpty :: P ()pEmpty = tokenPrim show nextPos getMaybeCell where nextPos pos = incSourceColumn pos 1

getMaybeCell mt = case mt of
  Just _ -> Nothing
  _      -> Just ()

-- | Match a cell with a specific labelpLabel :: T.Text -> P ()pLabel label = tokenPrim show nextPos matchText where nextPos pos = incSourceColumn pos 1

matchText mt = case mt of
  Just (CellText str) -> if label == T.fromStrict str
                           then Just ()
                           else Nothing
  _                   -> Nothinget :: Worksheet -> [DetailLine2]parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2parseRow sh row = r debug ("parseRow:cells= " ++ show cells) where cells = map (\col -> cellsh sh (row,col)) [1..11] r = case parseDl cells of Left err -> Nothing debug ( "parseRow " ++ show row ++ ":" ++ showerr) Right dl -> Just dl -- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->Either ParseError a parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2parseDl ss = parse p "source" ss type P a = Parsec [Maybe CellValue]() a p :: P DetailLine2p = pDLHeading pDLHeading :: P DetailLine2pDLHeading = do many1 pEmpty name <- pText many1 pEmpty pLabel "Date: Statement For: " pEmpty date <- pNumber return (DLHeading name date) -- | Return the text from a cellpText :: P T.TextpText = tokenPrim show nextPos getMaybeText where nextPos pos = incSourceColumn pos 1

getMaybeText mt = case mt of
  Just (CellText str) -> Just (T.fromStrict str)
  _                   -> Nothing

-- | Return the value of a cellpNumber :: P RationalpNumber = tokenPrim show nextPos getMaybeNumber where nextPos pos = incSourceColumn pos 1

getMaybeNumber mt = case mt of
  Just (CellDouble d) -> Just (double2Rational d)
  _                   -> Nothing

-- | Parse an empty cellpEmpty :: P ()pEmpty = tokenPrim show nextPos getMaybeCell where nextPos pos = incSourceColumn pos 1

getMaybeCell mt = case mt of
  Just _ -> Nothing
  _      -> Just ()

-- | Match a cell with a specific labelpLabel :: T.Text -> P ()pLabel label = tokenPrim show nextPos matchText where nextPos pos = incSourceColumn pos 1

matchText mt = case mt of
  Just (CellText str) -> if label == T.fromStrict str
                           then Just ()
                           else Nothing
  _                   -> Nothing

— Reply to this email directly or view it on GitHub https://github.com/qrilka/xlsx/issues/15#issuecomment-60057775.

olafklinke commented 5 months ago

Sorry to revive this already quite old issue, but what is the preferred way of parsing spreadsheets? Obviously, the traditional stream-based parsers are a bit limited, since xlsx already provides us with a nice CellMap we can traverse at will. If for some reason we were to shoe-horn a Worksheet into a stream-based parser, then the first problem is to define a suitable stream type. My first attempt is something like

data XlsToken = EndOfRow | C Cell
fromSheet :: CellMap -> [XlsToken]
instance Stream [XlsToken] where

(see also this comment) Forgoing the stream approach, we could use as a parser monad ReaderT CellMap (Either ParseError) which allows us to freely jump across the sheet as we see fit. The only compelling reason to use the stream-based approach is because we can build on libraries with excellent error reporting, like Megaparsec.

If stream-based spreadsheet parsing turns out to be of general interest, I could release a xlsx-megaparsec library. I think this has no place in the xlsx library itself.