techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
680 stars 35 forks source link

not all comment lines are recognized as comments #400

Closed daslu closed 8 months ago

daslu commented 8 months ago

This is probably related to the underlying parsing library.

Here is an example. Looks like every second comment line is unrecognized as comment.

 $ clj -Sdeps '{:deps {techascent/tech.ml.dataset {:mvn/version "7.027"}}}'
Clojure 1.11.1

(require 'tech.v3.dataset)

(spit "/tmp/data.csv"
      "# header comment A
# header comment B
# header comment C
# header comment D
x,y
# body comment A
# body comment B
# body comment C
# body comment D
1,2
3,4
5,6")

(tech.v3.dataset/->dataset "/tmp/data.csv")
/tmp/data.csv [7 2]:

| # header comment B | column-1 |
|--------------------|----------|
| # header comment D |          |
|                  x |        y |
|   # body comment B |          |
|   # body comment D |          |
|                  1 |        2 |
|                  3 |        4 |
|                  5 |        6 |
cnuernber commented 8 months ago

That was fixed in charred a version ago - tech.io hasn't been updated.

cnuernber commented 8 months ago

Thanks - btw - for filing - it is definitely a real bug, just fixed in a sub library.

daslu commented 8 months ago

Thanks!

daslu commented 8 months ago

Indeed adding com.cnuernber/charred {:mvn/version "1.034"} does prevent the problem.