typelevel / frameless

Expressive types for Spark.
Apache License 2.0
877 stars 138 forks source link

Test Frameless key functionality using real-word dataset (CSV, JSON, Paraquet) #280

Open imarios opened 6 years ago

imarios commented 6 years ago

Currently we don't have enough tests that work with real data. We also don't have enough tests that test the readers and the encoding for common formats like CSC, JSON, Parquet, etc.

SemanticBeeng commented 6 years ago

Looking to use frameless and interested to validate as well. Please provide more input on #1 "real data": public data sets or synthetic? #2 is performance testing in scope as well? #3 how would you like to manage the data sets used for test? (separate repo, etc)

OlivierBlanvillain commented 6 years ago
  1. "real data": public data sets or synthetic?

Doesn't really matter for me, I guess it would be easier with real data?

  1. is performance testing in scope as well?

It would be nice to validate but from what I could observe so far we are 1-on-1 with Spark everywhere.

  1. how would you like to manage the data sets used for test? (separate repo, etc)

Monorepo all the way :heart: