projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
262 stars 106 forks source link

Improve INFO field splitting logic; make variant splitting plan more … #654

Closed henrydavidge closed 3 months ago

henrydavidge commented 3 months ago

…compact

What changes are proposed in this pull request?

As reported in #537, the split_mutliallelics transformer splits INFO fields in an unexpected way for unbounded info fields. After this PR, we:

In addition, I replaced the looped calls to withColumn with batched calls to withColumns. Calling withColumn many times is not recommended as it can result in very large plans: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html

How is this patch tested?

(Describe any other testing)

To run Spark 4.0 tests, add [SPARK4] to the pull request title.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.36%. Comparing base (4f9d314) to head (6a2ef4f).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #654 +/- ## ======================================= Coverage 92.36% 92.36% ======================================= Files 126 126 Lines 7372 7377 +5 Branches 633 628 -5 ======================================= + Hits 6809 6814 +5 Misses 563 563 ``` | [Flag](https://app.codecov.io/gh/projectglow/glow/pull/654/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=projectglow) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/projectglow/glow/pull/654/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=projectglow) | `92.36% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=projectglow#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.