marklister / product-collections

A very simple, strongly typed, scala framework for tabular data. A collection of tuples. A strongly typed scala csv reader and writer. A lightweight idiomatic dataframe / datatable alternative.
BSD 2-Clause "Simplified" License
144 stars 19 forks source link

map-like access support #31

Closed antonkulaga closed 9 years ago

antonkulaga commented 9 years ago

As many csv files have headers it would be nice to have a version of colseq with headers support, where I will be able to get all headers and also get any column by header name (not only by number) and any cell by header name and row number.

marklister commented 9 years ago

Yeah, it would be a logical thing to provide, and I've thought it through before, but I've not seen a way to preserve the types for a column. In other words you'd wind up with a Seq[Any] after retreiving the column by header.

If you can think of a way around this limitation I'd be keen to implement the feature.

antonkulaga commented 9 years ago

What about introducing a class that extends ColSeq and has val headers:Seq[String] and some extra methods?

marklister commented 9 years ago

Yeah, that's easily doable...  On 19 Apr 2015 10:41, Anton Kulaga notifications@github.com wrote:What about introducing a class that extends ColSeq and has val headers:Seq[String] and some extra methods?

—Reply to this email directly or view it on GitHub.

marklister commented 9 years ago

Once again though if one wants to return a column the interface is going to have to be Map[String, Any]. I can probably do this in CollSeq not CollSeqN which makes things quite simple.

marklister commented 9 years ago

I mean Seq[Any] of course...

antonkulaga commented 9 years ago

Yes, the issue is not simple. I think it can be discussed in gitter as I see several alternative ways to solve it

marklister commented 9 years ago

Maybe tomorrow, I'm off to lunch shortly which in this country invariably involves the drinking of wine,..

antonkulaga commented 9 years ago

I do not like Seq[Any] I think we can get more by using case classes together with macro annotations. So, if you know the structure of csv file, then instead of ColSeq[String,String...ntype] you can create a case class. Here is a PR that does this https://github.com/marklister/product-collections/pull/32

marklister commented 9 years ago

I'm gonna separate this issue into two: Issue one deals with header support in CollSeq. I've put together a feature branch to deal with this branch: collseq-headers

The macro approach to a Seq of case classes #32 needs it own issue I think.

I've done some preliminary work on collseq-headers:

scala> val csv ="""a,b,c
     | 1,2,3
     | 4,5,6"""
csv: String =
a,b,c
1,2,3
4,5,6

scala> CsvParser[Int,Int,Int].parse(new java.io.StringReader(csv),hasHeader=true)
res0: com.github.marklister.collections.immutable.CollSeq3[Int,Int,Int] =
         a,b,c
CollSeq((1,2,3),
        (4,5,6))

scala> res0.collMap("b")
res1: Seq[Any] = List(2, 5)

scala> res0.collMap
res2: Map[String,Seq[Any]] = Map(a -> List(1, 4), b -> List(2, 5), c -> List(3, 6))

scala> res0.collMap("b")
res3: Seq[Any] = List(2, 5)

scala> res0.collMap("c")
res4: Seq[Any] = List(3, 6)