This package has been (kind of) deprecated. My continued work now lies in the csv-conduit package, as conduit ended up creating a pretty large network of libraries that we can interact with. We can easily plug into other conduits, enabling us to, for example, incremental parse over the network or read a CSV file and shove results into a Chan incrementally.
CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.
While there are a number of csv libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:
This library is an attempt to close these gaps.
csv-enumerator is an enumerator-based CSV parsing library that is easy to use, flexible and fast. Furthermore, it provides ways to use constant-space during operation, which is absolutely critical in many real world use cases.
The API is quite well documented and I would encourage you to keep it handy.
While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.
{-# LANGUAGE OverloadedStrings #-}
import Data.CSV.Enumerator
import Data.Char (isSpace)
import qualified Data.Map as M
import Data.Map ((!))
-- Naive whitespace stripper
strip = reverse . B.dropWhile isSpace . reverse . B.dropWhile isSpace
-- A function that takes a row and "emits" zero or more rows as output.
processRow :: MapRow -> [MapRow]
processRow row = [M.insert "Column1" fixedCol row]
where fixedCol = strip (row ! "Column1")
main = mapCSVFile "InputFile.csv" defCSVSettings procesRow "OutputFile.csv"
and we are done.
Further examples to be provided at a later time.
Any and all kinds of help is much appreciated!