willow-ahrens / Finch.jl

Sparse tensors in Julia and more! Datastructure-driven array programing language.
http://willowahrens.io/Finch.jl/
MIT License
158 stars 15 forks source link

API: Add `fill_value` and `fill_value_type` config argument to `Base.similar` #438

Closed mtsokol closed 6 months ago

mtsokol commented 6 months ago

Hi @willow-ahrens,

This PR adds fill_value and fill_value_type configuration argument to Base.similar to match https://docs.julialang.org/en/v1/base/arrays/#Base.similar.

Right now the change of data type can be done with:

tns = ... # let's assume it's Int64
new_tns = copyto!(similar(tns, Finch.FillValueConfig(0, Float64)), tns)

Let me know if a tuple is better than a dedicated struct, but I think it's more descriptive.

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 88.46154% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 76.76%. Comparing base (598f225) to head (9cd7f7d).

Files Patch % Lines
src/tensors/levels/patternlevels.jl 0.00% 1 Missing :warning:
src/tensors/levels/repeatrlelevels.jl 0.00% 1 Missing :warning:
src/tensors/levels/sparsetrianglelevels.jl 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #438 +/- ## ========================================== + Coverage 76.63% 76.76% +0.13% ========================================== Files 89 89 Lines 8268 8273 +5 ========================================== + Hits 6336 6351 +15 + Misses 1932 1922 -10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

willow-ahrens commented 6 months ago

I think the tuple is not a good fit because similar(zeros(5, 5), (5, 5)) is a valid way to specify sizes. Let's add a fill_value kwarg, I think that is the best approach here.

willow-ahrens commented 6 months ago

Also, not in this pr of course, but what do you think of changing the term default in finch to be fill_value?

mtsokol commented 6 months ago

Also, not in this pr of course, but what do you think of changing the term default in finch to be fill_value?

I like this idea! fill_value is more precise in my opinion.

mtsokol commented 6 months ago

I think the tuple is not a good fit because similar(zeros(5, 5), (5, 5)) is a valid way to specify sizes. Let's add a fill_value kwarg, I think that is the best approach here.

@willow-ahrens Do you mean keyword only argument fill_value that accepts a tuple or a dedicated struct as in this PR?

similar(A, (3, 3), fill_value=FillValueConfig(0, Int32))
similar(A, (3, 3), fill_value=(0, Int32))
willow-ahrens commented 6 months ago

similar(A, Int32, (3, 3), fill_value=0)

willow-ahrens commented 6 months ago

ah but it's a little weird to write similar(A, 5, 5, fill_value=0) and expect the method to pick up the type of fill_value and call similar(A, Int, 5, 5, fill_value=0)

willow-ahrens commented 6 months ago

Should we just support similar(A, dtype, ...) and similar(A, (fill_value, dype), ..) and nothing else?

willow-ahrens commented 6 months ago

Open to opinions on that. I think the Config struct is too high-powered, so I'm happy with either the fill-value kwarg or the two styles with the tuple I mentioned above.

willow-ahrens commented 6 months ago

You're going to have a little trouble merging because I changed the constructor tests to be more generic. However, this should help ensure that the constructor tests cover all of the cases of similar. Could you take the tests for similar that you have and put them into the new constructor test format? They could just go in the inner loop as another case.

mtsokol commented 6 months ago

One question: should similar(Tensor(SparseList(Element(0.0), PlusOneArray(ptr), PlusOneArray(idx))) preserve the PlusOneArrays? Currently it wont

Hmm, in my opinion it shouldn't.

Could you take the tests for similar that you have and put them into the new constructor test format?

Sure! Let me rebase and update the test suite.

mtsokol commented 6 months ago

@willow-ahrens I updated the test suite with similar test cases. A few comments:

willow-ahrens commented 6 months ago

Okay, I made a few small requests but this should be good to merge after those. This is waiting on fixes to sparsebandlevel

mtsokol commented 6 months ago

Okay, I made a few small requests but this should be good to merge after those. This is waiting on fixes to sparsebandlevel

Hmm, after rebasing I noticed that two conversions tests for SparseList{Separate} fail. Both of them fail with:

ERROR: propagate_copies reached unrecognized expression level_size(tmp_lvl_2.lvl)...

I guess it's coming from:

$sym = similar_level(
    $(lvl.ex).lvl,
    level_default(typeof($(lvl.ex).lvl)),
    level_eltype(typeof($(lvl.ex).lvl)),
    level_size($(lvl.ex).lvl)...
)

that we implemented for Separate level. Do you know how it could be fixed? (Whether to change similar_level() call here, or modify propagate_copies, or change all similar_level functions from vararg to tuple for dims argument?)

willow-ahrens commented 6 months ago

it looks like copy propagation and dead code elimination don't support splatting. That's kinda unfortunate, we could write the call without a splat or we could fix those two algorithms to support splatting. I'll respond more in a bit

willow-ahrens commented 6 months ago

I think we should write the call without the splat, I think it's better to keep the generated code with a more barebones syntax. If you can get the version that calls e.g. virtual_level_size to work, that version makes code with no splats.

mtsokol commented 6 months ago

@willow-ahrens It worked with $(map(ctx, map(getstop, virtual_level_size(lvl, ctx)))...), I also updated test/reference*/ files. The PR is ready from my side!

willow-ahrens commented 6 months ago

Could you also bump the version?

mtsokol commented 6 months ago

Could you also bump the version?

Done!