Closed sadikovi closed 7 years ago
@sunchao this is something I thought would be useful to have. Could you review this pull request? Thanks!
Thanks @sadikovi . This PR looks great! On the high level, I wonder if there's a standard format for message string of Parquet that this is expecting?
@sunchao thank you for taking a look at this PR.
I do not know if there is an explicit standard. I think it is okay as long as string contains json like format that is parseble by this code. It is modelled after MessageTypeParser in Parquet-mr.
The reason I added it was that I was planning to get schematic as string parse it and then check if parsed scheme is a subtype of Parquet full scheme.
Are you aware of any standard on schema string?
@sadikovi I'm not aware of any standard on this, but seems parquet-cpp and parquet-mr are using the same format. I also noticed that the printer
for parquet-rs has some differences, such as it doesn't print length for fixed_length_byte_array, which we may need to fix.
Patch looks good. Merged. Thanks @sadikovi !
Thanks for merging, I will have a look at the printer issue.
This PR introduces schema parser that is used to convert string representation of message type into instance of
parquet::schema::types::Type
.API:
parse_message_type<'a>(message_type: &'a str) -> Result<Type>
, wheremessage_type
is string (multi/single line) that represents Parquet schema.Usage is below:
This class is created as a
Printer
counterpart.I also had to introduce
FromStr
traits for physical, logical types, and repetition to reduce amount of code to write for parsing.Added unittests and also tested manually on some nested and plain schemas.