ncss-tech / aqp

Algorithms for Quantitative Pedology
http://ncss-tech.github.io/aqp/
54 stars 14 forks source link

`flagOverlappingHz()` performance improvements #308

Closed brownag closed 7 months ago

brownag commented 8 months ago

flagOverlappingHz() is called by fillHzGaps() which is used by several higher level functions including:

The current implementation of flagOverlappingHz() on master branch is slow, based on iteration over single profile SPCs, and complex operations on each element... it is the major bottleneck in fillHzGaps().

This PR improves processing time by approximately 4 orders of magnitude with a 10,000 profile SoilProfileCollection. It also allows for flagging overlap in horizons that have missing depths (a TODO item), improves tests, and ensures that imperfectly overlapping horizons are not flagged.

library(aqp)
#> This is aqp 2.0.3

x <- data.table::rbindlist(lapply(1:10000, random_profile, SPC = FALSE))
depths(x) <- id ~ top + bottom

# flag overlap in 10k random profiles (that have no overlap)
system.time(flagOverlappingHz(x))

## BEFORE
#>   user  system elapsed 
#> 98.309   0.483  98.877 

## AFTER
#>  user  system elapsed 
#> 0.002   0.006   0.013 
brownag commented 8 months ago

On second thought, iteration may be necessary here (i.e. https://github.com/ncss-tech/aqp/pull/308/commits/8f5e02185caab75648aeb09cfc4cd93e3ac70464 may need to be reverted/altered), or a few more checks and tests for the multiprofile case to be properly covered. Will add a test to this that ensures logic is corrected.

EDIT: multiple profile tests and edge cases added in https://github.com/ncss-tech/aqp/pull/308/commits/7aa9232f98b56b0e6c12aef37b90cc6bee15f4d7 and test failures corrected with https://github.com/ncss-tech/aqp/pull/308/commits/ab1f61f4beb86e476ece521195a0958afea891c4 (no iteration needed)

dylanbeaudette commented 8 months ago

Nice improvements and clever approach to NA depths. Thanks for working on this.

brownag commented 8 months ago

Nice improvements and clever approach to NA depths. Thanks for working on this.

Thanks!

I encountered it trying fillHzGaps() on ~13k KSSL pedons via fetchLDM() and was surprised how long it took, then realized that there were some notes on need to optimize it. I guess I have never noticed issues with smaller SPCs, but is pretty apparent when working with the LDM snapshot.

dylanbeaudette commented 8 months ago

Agreed, that bottleneck had to go. Thanks for tackling it.

brownag commented 7 months ago

Is this OK to merge now?