sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Add usage examples #14

Closed sadikovi closed 7 years ago

sadikovi commented 7 years ago

This PR adds examples for extracting file metadata and reading data.

I added Parquet file that contains example data, such as primitive columns, struct fields and list fields to show different usage. It does not contain null values. Below the actual data:

message spark_schema {
  required int32 a;
  required int64 b;
  optional binary c (UTF8);
  required group d {
    required int32 a;
    required int64 b;
    optional binary c (UTF8);
  }
  required group e (LIST) {
    repeated group list {
      required int32 element;
    }
  }
}

+----+---+---+------------+------------+
|   a|  b|  c|           d|           e|
+----+---+---+------------+------------+
|   1|  2|abc|   [1,2,abc]|      [1, 1]|
|  10|  3|def|  [10,3,def]|    [10, 10]|
| 100|  4|ghi| [100,4,ghi]|  [100, 100]|
|1000|  4|jkl|[1000,4,jkl]|[1000, 1000]|
+----+---+---+------------+------------+

Also updated README with links to source files and commands to run.

sadikovi commented 7 years ago

@sunchao could you have a look at this PR?

This is a very first pass on usage examples - my attempts to read file with the library. Could you let me know if code is okay and/or I am doing something wrong reading a file?

Would like to know your opinion on the code, whether or not adding it into README directly and having separate binary example Parquet file.

Thanks!

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-2.0%) to 82.246% when pulling a347c35ab0aafeef794afd2b92f8a314cd1b97b3 on sadikovi:usage-example into 0d09371de45d958838fe41380af11c7eca49bb85 on sunchao:master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-1.9%) to 82.308% when pulling 0f873b9d9a711f639ab9ba3366d8286bfc75d543 on sadikovi:usage-example into 0d09371de45d958838fe41380af11c7eca49bb85 on sunchao:master.

sadikovi commented 7 years ago

@sunchao I updated PR, could you review again? Thanks.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-1.6%) to 82.597% when pulling 0fc8b22a2d420ac34c3e2b41e63751a226f0c44d on sadikovi:usage-example into 0d09371de45d958838fe41380af11c7eca49bb85 on sunchao:master.

sadikovi commented 7 years ago

I think we should invest time to build high level read API, rather than merging this PR. Once that is done we could add small readme and publish API docs (scheme building and reading). Let me know what you think.

sadikovi commented 7 years ago

@sunchao thanks for the review! I will address your comments and fix this in couple of days - need to learn more about parsing in parquet-cpp and parquet-mr.

sadikovi commented 7 years ago

I am going to close this for now, will revisit later once I have patched several things.