projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
263 stars 110 forks source link

update pipe transformer docs to include quarantine #461

Closed williambrandler closed 2 years ago

williambrandler commented 2 years ago

Signed-off-by: William Brandler william.brandler@databricks.com

What changes are proposed in this pull request?

Adds an example for the pipe transformer to quarantine corrupted data. Otherwise, a single edge case can cause a spark job to fail.

This coincides with the release of Glow v1.1.2

One issue that was discovered during testing is that the quarantine table contains all the data A simple solution is to join it back to the pipe transformer output dataframe. But this does make the quarantine table unnecessary.

How is this patch tested?

(Details)

codecov[bot] commented 2 years ago

Codecov Report

Merging #461 (b22d3dc) into master (be635c7) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #461   +/-   ##
=======================================
  Coverage   93.66%   93.66%           
=======================================
  Files          95       95           
  Lines        4849     4849           
  Branches      463      463           
=======================================
  Hits         4542     4542           
  Misses        307      307           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update be635c7...b22d3dc. Read the comment docs.