utahplt / TrackedFloats.jl

Julia library providing tracking of floating point errors through a program resources
https://juliahub.com/ui/Packages/General/TrackedFloats
MIT License
34 stars 3 forks source link

injection unsuccessful & slow in advection2d #10

Closed bennn closed 1 year ago

bennn commented 1 year ago

What program did you run?

#=
2D advection using higher order FV or structured or unstructured mesh
=#

### If the Finch package has already been added, use this line #########
using Finch # Note: to add the package, first do: ]add "https://github.com/paralab/Finch.git"

using Dates
using FloatTracker: TrackedFloat64, write_log_to_file, set_inject_nan, set_logger, set_exclude_stacktrace

set_exclude_stacktrace([:prop])
set_logger(filename="inj-adv2d", buffersize=20, cstg=true, cstgArgs=false, cstgLineNum=true)
fns = [] ##[FunctionRef(:run_simulation, Symbol("nbody_simulation_result.jl"))]
libs = [] ##["NBodySimulator", "OrdinaryDiffEq"]
now_str = Dates.format(now(), "yyyymmddHHMMss")
recording_file = "rand-adv2d-recording_$now_str"
println("Recording to $recording_file...")

### If not, use these four lines (working from the examples directory) ###
# if !@isdefined(Finch)
#     include("../Finch.jl");
#     using .Finch
# end
##########################################################################

initFinch("inj-adv2d", TrackedFloat64);

# Configuration setup
domain(2)
solverType(FV)

timeStepper(EULER_EXPLICIT,cfl=20000)
# NaN-making timestepper

########
set_inject_nan(true, 1000, 1, fns, libs, record=recording_file)
########

println("begin MESH")
# a uniform grid of quads on a 0.1 x 0.3 rectangle domain
mesh(QUADMESH, elsperdim=[15, 45], bids=4, interval=[0, 0.1, 0, 0.3])
println("end MESH")

# Variables and BCs
u = variable("u", location=CELL)
boundary(u, 1, FLUX, "(abs(y-0.06) < 0.033 && sin(3*pi*t)>0) ? 1 : 0") # x=0
boundary(u, 2, NO_BC) # x=0.1
boundary(u, 3, NO_BC) # y=0
boundary(u, 4, NO_BC) # y=0.3

# Time interval and initial condition
T = 1.3;
# T=10
timeInterval(T)
initial(u, "0")

# Coefficients
coefficient("a", ["0.1*cos(pi*x/2/0.1)","0.3*sin(pi*x/2/0.1)"], type=VECTOR) # advection velocity
coefficient("s", ["0.1 * sin(pi*x)^4 * sin(pi*y)^4"]) # source

# The "upwind" function applies upwinding to the term (a.n)*u with flow velocity a.
# The optional third parameter is for tuning. Default upwind = 0, central = 1. Choose something between these.
conservationForm(u, "s + surface(upwind(a,u))");

println("begin SOLVE")
solve(u)
println("end SOLVE")

finalizeFinch()

write_log_to_file()

With the following changes to FloatTracker.jl

diff --git a/src/Injector.jl b/src/Injector.jl
index 9a80490..b8c0172 100644
--- a/src/Injector.jl
+++ b/src/Injector.jl
@@ -101,6 +101,7 @@ function should_inject(i::Injector)::Bool
   end

   if i.active && i.ninject > 0 && rand(1:i.odds) == 1
+    println("inject: active, odds match, ctr $(i.place_counter)")
     st = stacktrace()
     if i.record !== ""
       # We're recording this
@@ -109,6 +110,8 @@ function should_inject(i::Injector)::Bool
         fh = open(i.record, "a")
         println(fh, "$(i.place_counter), $(frame_file(drop_ft_frames(st)[1]))")
         close(fh)
+      else
+        println("IFAIL place counter $(i.place_counter)")
       end
       return did_injectp
     else
@@ -188,9 +191,18 @@ function injectable_region(i::Injector, raw_frames::StackTraces.StackTrace)::Boo
   end

   # Default: don't inject
+  println("dont inject $(frame_library(frames[1]))")
+  pp_frames(frames)
+  println("")
   return false
 end

+function pp_frames(frames::StackTraces.StackTrace)
+  for f::StackTraces.StackFrame in frames
+    println("$(f.func) at $(f.file):$(f.line)")
+  end
+end
+
 getmodule(_) = nothing
 getmodule(x::Base.StackTraces.StackFrame) = getmodule(x.linfo)
 getmodule(x::Core.MethodInstance) = getmodule(x.def)

What did you expect to happen?

  1. Injector quickly finds a place to add a NaN
  2. Injector stops checking stack frames from that point onward
  3. Something fun happens because of the NaN. (Output logs appear, program crashes, etc.)

What happened

Injector never succeeded.

It failed because frame_library(frames[1]) was always nothing

Here's the first stack trace that got pretty printed:

% julia inj-adv2d-fv.jl
Recording to rand-adv2d-recording_202304201600785...
begin MESH
inject: active, odds match, ctr 806
dont inject nothing
run_or_inject at /home/ben/code/uu/fpx/FloatTracker.jl/src/TrackedFloat.jl:38
* at /home/ben/code/uu/fpx/FloatTracker.jl/src/TrackedFloat.jl:113
_generic_matmatmul! at /home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:876
generic_matmatmul! at /home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:844
mul! at /home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:303
mul! at /home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:276
* at /home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:141
build_refel at /home/ben/code/julia/Finch/src/refel.jl:340
#grid_from_mesh#21 at /home/ben/code/julia/Finch/src/grid.jl:159
grid_from_mesh##kw at /home/ben/code/julia/Finch/src/grid.jl:80
macro expansion at /home/ben/code/julia/Finch/src/Finch.jl:252
macro expansion at /home/ben/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:237
#add_mesh#134 at /home/ben/code/julia/Finch/src/Finch.jl:145
add_mesh##kw at /home/ben/code/julia/Finch/src/Finch.jl:142
macro expansion at /home/ben/code/julia/Finch/src/finch_interface.jl:398
macro expansion at /home/ben/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:237
#mesh#81 at /home/ben/code/julia/Finch/src/finch_interface.jl:343
mesh##kw at /home/ben/code/julia/Finch/src/finch_interface.jl:341
top-level scope at /home/ben/code/uu/fpx/examples/examples/finch/inj-adv2d-fv.jl:41
eval at ./boot.jl:368
include_string at ./loading.jl:1428
_include at ./loading.jl:1488
include at ./Base.jl:419
exec_options at ./client.jl:303
_start at ./client.jl:522

Other comments

We have 2 problems:

  1. Injector never succeeds
  2. The not succeeding is slow! Gotta check the stack trace every time. Can we use a cache or a backoff strategy to reduce the slowdown? (My quick fix was to change the odds from 1/1 to 1/1000.)

I didn't install FloatTracker using . as the active project. I can go back and do that if its the easiest fix.

ashton314 commented 1 year ago

Yeah, I just tested it and our admittedly hacky frame_library() function is having a hard time. I tried extracting the lib name in the REPL for the filenames on those traces and this is what I got:

the_rx = r".julia[\\/](packages|dev|scratchspaces)[\\/]([a-zA-Z][a-zA-Z0-9_.-]*)[\\/]"
match(the_rx, "/home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl") # => returns Nothing
match(the_rx, "/home/ben/code/julia/julia-1.8.0/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl") # => returns Nothing

As soon as the injection fires—with or without recording—the should_inject function will bail out fast because i.ninject > 0 will return false.

ashton314 commented 1 year ago

If you can try a more "standard" install and it works, then I think we'd best just let the issue rest for a while. Otherwise, a little investment in making this function Do a Better Thing might be worth it.

bennn commented 1 year ago

It's better after following the instructions in the readme here.

After installing that way, julia --project=. examples/finch/inj-adv2d.jl ran quickly and successfully injected one NaN.

Now, to keep injecting and hunt for bugs!

ashton314 commented 1 year ago

It's better after following the instructions in the readme here.

I'm delighted to hear that the instructions were sufficient to get this up-and-running! That's good for reproducibility!