in-phase / ph-core

A reimagining of scientific computing with the speed, typing, and flexibility of Crystal.
https://phase.blog/
MIT License
1 stars 0 forks source link

Restrinct most uses of Indexable to Array #13

Open shinzlet opened 2 years ago

shinzlet commented 2 years ago

We had initially used Indexable in a lot of places (e.g, MultiIndexable#unsafe_fetch_element takes Indexable as a coord) in order to allow future flexibility. However I think that this puts unneccessary work on the compiler, leads to complex generics issues, and isn't useful or consistent at the moment. The main user-facing getters and setters should use Indexable for maximum flexibility, but internal code should switch to Array for consistency.

shinzlet commented 2 years ago

This isn't neccessarily what we want to do, upon further thought. The problems that most commonly occur are things like "can't do foo[2] where foo is Indexable, because Tuple(Int32) is an Indexable and foo[2] is out of bounds for it" - i.e. some builtin types have issues caused by index bounds checking, and this pollutes 'Indexable'. (Note that these are only compiler issues, we have the runtime bounds checking and structure needed to ensure that foo[2] is always valid, but the compiler can't tell that).

One possible solution to this is to harden all of our container use, and make it so that only Array is allowed to be a coordinate. That way, nothing can get mad at compile time when indexing. This causes user facing annoyances though; what if they have a tuple they want to use as a coordinate? We would need to copy user provided coordinates into an Array, which isn't ideal.

Another solution I'm exploring is to make it so that the user is allowed to pass in anything, so long as Phase internally wraps it in a ReadonlyWrapper before use. ReadonlyWrapper only accesses via unsafe_fetch, which is not subject to the index checking problems that I describe, and it ensures that Phase can't modify the user provided coordinate objects - which, i think, is good. Gonna do some benchmarking and think about it more.

shinzlet commented 2 years ago

I ran a little benchmark to see how that idea holds up:

require "./src/readonly_wrapper.cr"
require "benchmark"

module Phase
    arr = [1, 2, 3, 4, 5, 6, 7]
    Benchmark.ips do |x|
        x.report("raw") do
            arr.sum
        end

        x.report("ro") do
            ReadonlyWrapper.new(arr).sum
        end
    end
end

The point of this benchmark is to measure the latency added just by constructing the readonly wrapper. Unfortunately, it seems large (1.8x slower on debug, 4.42x slower on release). Because accessing an array is such a critical path in Phase, I think this rules out the "wrap all user inputs in a readonly wrapper" idea.

However, I think it's still possible to prevent the errors I'm describing by only using unsafe_fetch on the coordinates. I'm not sure exactly how appropriate this is - if a user passes in some sort of strange container that changes size between accesses or is mutable from another thread, then the unsafe accesses would cause soundness issues. However, I find that to be pretty farfetched, and definitely a user error more than a design issue.

shinzlet commented 2 years ago

I wanted to see how performant readonly wrappers were in general (ignoring construction time). It seems that, at least in release mode, the compiler is able to totally crush the performance difference:

require "./src/readonly_wrapper.cr"
require "benchmark"

module Phase
    arr = [1, 2, 3, 4, 5, 6, 7]
        wrapper = ReadonlyWrapper.new(arr)
    Benchmark.ips do |x|
        x.report("raw") do
            arr.sum
        end

        x.report("ro") do
            wrapper.sum
        end
    end
end
raw 224.51M (  4.45ns) (± 8.59%)  0.0B/op   1.02× slower
 ro 229.19M (  4.36ns) (±13.57%)  0.0B/op        fastest

The readonly wrapper sum seems to be consistently sightly faster? I have no idea why but I imagine it's not a consistent improvement.

Benchmarks are not usually used outside of release mode, but Emily and I tend to check the debug build speeds as well (most scientific code is run as a one shot, so it does make a difference for us):

Warning: benchmarking without the `--release` flag won't yield useful results
raw  14.54M ( 68.79ns) (±12.56%)  0.0B/op        fastest
 ro  10.99M ( 91.03ns) (±10.15%)  0.0B/op   1.32× slower

This is totally acceptable as well.

Conclusion: ReadonlyWrappers are basically transparent when it comes to performance, but constructing them is a big time sink. This is something that can likely be worked around, though, and I expect they're worth the hassle to avoid cloning coordinates and trying to keep users from mutating arrays.