Open selitvin opened 4 years ago
Base: 82.88% // Head: 85.99% // Increases project coverage by +3.11%
:tada:
Coverage data is based on head (
3fe68d4
) compared to base (83a02df
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
When writing data into a petastorm dataset. Before a pyspark sql.Row object is created, fields containing data that is not natively supported by Parqyet format, such as numpy arrays, are serialized into byte arrays. Images maybe compressed using png or jpeg compression.
Serializing fields on a thread pool speeds up this process in some cases (e.g. a row contains multiple images).