jblindsay / whitebox-tools

An advanced geospatial data analysis platform
https://www.whiteboxgeo.com/
MIT License
967 stars 161 forks source link

Arc::try_unwrap fails intermittently during breach_depressions_least_cost #407

Open jontwo opened 6 months ago

jontwo commented 6 months ago

I don't have a good repro case for this, but one of our system tests is failing intermittently. I managed to get a stacktrace out, but it doesn't really tell us anything, just the line number where it failed.

INFO     farm:d8_pointer.py:878 *****************************************
INFO     farm:d8_pointer.py:878 * Welcome to BreachDepressionsLeastCost *
INFO     farm:d8_pointer.py:878 * Powered by WhiteboxTools              *
INFO     farm:d8_pointer.py:878 * www.whiteboxgeo.com                   *
INFO     farm:d8_pointer.py:878 *****************************************
INFO     farm:d8_pointer.py:878 Reading data...
INFO     farm:d8_pointer.py:878 Num. solved pits: 168
INFO     farm:d8_pointer.py:878 Num. unsolved pits: 1
INFO     farm:d8_pointer.py:878 Filling remaining depressions...
INFO     farm:d8_pointer.py:878 thread 'main' panicked at 'Error unwrapping 'output'', whitebox-tools-app/src/tools/hydro_analysis/breach_depressions_least_cost.rs:657:27
INFO     farm:d8_pointer.py:878 stack backtrace:
INFO     farm:d8_pointer.py:878 0:     0x562e60e0b16c - std::backtrace_rs::backtrace::libunwind::trace::h3fea1eb2e0ba2ac9
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
INFO     farm:d8_pointer.py:878 1:     0x562e60e0b16c - std::backtrace_rs::backtrace::trace_unsynchronized::h849d83492cbc0d59
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
INFO     farm:d8_pointer.py:878 2:     0x562e60e0b16c - std::sys_common::backtrace::_print_fmt::he3179d37290f23d3
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:67:5
INFO     farm:d8_pointer.py:878 3:     0x562e60e0b16c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h140f6925cad14324
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:46:22
INFO     farm:d8_pointer.py:878 4:     0x562e60e324bc - core::fmt::write::h31b9cd1bedd7ea38
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/fmt/mod.rs:1150:17
INFO     farm:d8_pointer.py:878 5:     0x562e60e07515 - std::io::Write::write_fmt::h1fdf66f83f70913e
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/io/mod.rs:1667:15
INFO     farm:d8_pointer.py:878 6:     0x562e60e0cce0 - std::sys_common::backtrace::_print::he7ac492cd19c3189
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:49:5
INFO     farm:d8_pointer.py:878 7:     0x562e60e0cce0 - std::sys_common::backtrace::print::hba20f8920229d8e8
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:36:9
INFO     farm:d8_pointer.py:878 8:     0x562e60e0cce0 - std::panicking::default_hook::{{closure}}::h714d63979ae18678
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:210:50
INFO     farm:d8_pointer.py:878 9:     0x562e60e0c897 - std::panicking::default_hook::hf1afb64e69563ca8
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:227:9
INFO     farm:d8_pointer.py:878 10:     0x562e60e0d394 - std::panicking::rust_panic_with_hook::h02231a501e274a13
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:624:17
INFO     farm:d8_pointer.py:878 11:     0x562e60e0ce42 - std::panicking::begin_panic_handler::{{closure}}::h5b5e738edf361af7
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:519:13
INFO     farm:d8_pointer.py:878 12:     0x562e60e0b634 - std::sys_common::backtrace::__rust_end_short_backtrace::h601a115def7987b1
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:141:18
INFO     farm:d8_pointer.py:878 13:     0x562e60e0cdd9 - rust_begin_unwind
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:517:5
INFO     farm:d8_pointer.py:878 14:     0x562e5ff1f961 - core::panicking::panic_fmt::h7a58c8fffc5559a4
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/panicking.rs:101:14
INFO     farm:d8_pointer.py:878 15:     0x562e6078cdfd - <whitebox_tools::tools::hydro_analysis::breach_depressions_least_cost::BreachDepressionsLeastCost as whitebox_tools::tools::WhiteboxTool>::run::h81c67dd42e84ab9e
INFO     farm:d8_pointer.py:878 16:     0x562e6069e8d2 - whitebox_tools::tools::ToolManager::run_tool::h53a77cdd713cd574
INFO     farm:d8_pointer.py:878 17:     0x562e607748b5 - whitebox_tools::run::h7f1fd320ded50c56
INFO     farm:d8_pointer.py:878 18:     0x562e6076d7fa - whitebox_tools::main::h803c24de18fe5847
INFO     farm:d8_pointer.py:878 19:     0x562e60af40c3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h3026df5b86c4c0c7
INFO     farm:d8_pointer.py:878 20:     0x562e605ad7e9 - std::rt::lang_start::{{closure}}::h7ae49518c4aa84f7
INFO     farm:d8_pointer.py:878 21:     0x562e60e0d99a - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h2790017aba790142
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/ops/function.rs:259:13
INFO     farm:d8_pointer.py:878 22:     0x562e60e0d99a - std::panicking::try::do_call::hd5d0fbb7d2d2d85d
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:403:40
INFO     farm:d8_pointer.py:878 23:     0x562e60e0d99a - std::panicking::try::h675520ee37b0fdf7
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:367:19
INFO     farm:d8_pointer.py:878 24:     0x562e60e0d99a - std::panic::catch_unwind::h803430ea0284ce79
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panic.rs:129:14
INFO     farm:d8_pointer.py:878 25:     0x562e60e0d99a - std::rt::lang_start_internal::{{closure}}::h3a398a8154de3106
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/rt.rs:45:48
INFO     farm:d8_pointer.py:878 26:     0x562e60e0d99a - std::panicking::try::do_call::hf60f106700df94b2
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:403:40
INFO     farm:d8_pointer.py:878 27:     0x562e60e0d99a - std::panicking::try::hb2022d2bc87a9867
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:367:19
INFO     farm:d8_pointer.py:878 28:     0x562e60e0d99a - std::panic::catch_unwind::hbf801c9d61f0c2fb
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panic.rs:129:14
INFO     farm:d8_pointer.py:878 29:     0x562e60e0d99a - std::rt::lang_start_internal::hdd488b91dc742b96
INFO     farm:d8_pointer.py:878 at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/rt.rs:45:20
INFO     farm:d8_pointer.py:878 30:     0x562e60776d92 - main
INFO     farm:d8_pointer.py:878 31:     0x7f27edb1324a - <unknown>
INFO     farm:d8_pointer.py:878 32:     0x7f27edb13305 - __libc_start_main
INFO     farm:d8_pointer.py:878 33:     0x562e5ff20134 - <unknown>
INFO     farm:d8_pointer.py:878 34:                0x0 - <unknown>

The line referenced above is this:

output = match Arc::try_unwrap(output2) {

I'm assuming this is a race condition as 95% of the time the test passes. Is there anyway to add a delay or a retry so it doesn't error straight away? We might try and implement that in our Python code to see if we can make the test more stable.

erakhmatulina commented 3 months ago

having the same issue

jblindsay commented 3 months ago

Are you able to share your data with me for testing? Otherwise this is a very difficult bug to identify the cause of. Also, how are you using WhiteboxTools, e.g. are you using a particular frontend or perhaps the R or Python APIs?

jontwo commented 1 month ago

bdlc_repro.tar.gz

We're using the Python frontend. I've attached a repro case that should show the issue - the Python script is pretty quick to run, but you need to run it a few times to see it fail. My bash-fu isn't great so the run.sh script doesn't actually exit when it fails, but if you pipe the output into a log, run it for a couple of minutes, then cancel it, you can search the logs for "panic". It comes up a few times.

Example:

...
Finding pits: 98%
Finding pits: 99%
Finding pits: 100%
Breaching: 0%
Breaching: 25%
Breaching: 50%
Breaching: 75%
Breaching: 100%
Num. solved pits: 4
Num. unsolved pits: 1
Filling remaining depressions...
Finding pit cells: 12%
Finding pit cells: 25%
Finding pit cells: 37%
Finding pit cells: 50%
Finding pit cells: 62%
Finding pit cells: 75%
Finding pit cells: 87%
Finding pit cells: 100%
thread 'main' panicked at whitebox-tools-app/src/tools/hydro_analysis/breach_depressions_least_cost.rs:657:27:
Error unwrapping 'output'
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
./whitebox_tools --run="BreachDepressionsLeastCost" --dem='/home/jontwo/bdlc_repro/data/burned_dem.tif' --output='/tmp/d8_repro0/fixed_dem_streams.tif' --dist='10' --fill -v --compress_rasters=True

At the moment I'm running whitebox 2.3.5 in Python 3.12, but we've had this fail in CI for a few different versions.