mathnet / mathnet-numerics

Math.NET Numerics
http://numerics.mathdotnet.com
MIT License
3.47k stars 893 forks source link

support Span<double> version of many functions #1083

Open hdlee opened 1 month ago

hdlee commented 1 month ago

As Span and IEnumerable are not compatible, it is necessary to support Span version to make user code more effiecient.

strMikhailPotapenko commented 1 month ago

Might be easier to start by discussing a single or small number of cases. Is there a function in particular which you felt a performance hit due to lack of Span<double> support?

hdlee commented 1 month ago

for example, calculating a series of values based on a sliding window of a large double array. if no Span, we have to clip a new array to calculate a single value. so we get many unnecessary new array operations.

strMikhailPotapenko commented 1 month ago

Sorry, could you be little more specific with your example? Maybe if you provided a small code example for the case which you are trying to address it would make what you're talking about clearer.

hdlee commented 1 month ago

ok, here you are:

    var xarray = new double[1000000];
    var yarray = new double[1000000];

    var window = 10000;

    var result = new double[1000000];

    for (int i = 0; i < xarray.Length - window; ++i)
    {
        var xx = xarray.Skip(i).Take(window).ToArray();
        var yy = yarray.Skip(i).Take(window).ToArray();
        // here there're many new array operations, if we can support Span<double> version like below:
        // var xx = xarray.AsSpan(i, window);
        // var yy = yarray.AsSpan(i, window);
        // much tidy code and minimized new operations
        result[i] = Fit.LineThroughOrigin(xx, yy);
    }
strMikhailPotapenko commented 1 month ago

That is a very clear example, thank you for providing that.

I created a branch in my fork of this repo where I did some POC testing. Can you take a look and let me know what you think? For the example you chose, adding support for Span was trivial because the Span object has the same operator overloads and some methods in common with Array so I only needed to change the method signature to accept Span objects. Please check out FitTests.FitsToBestLineThroughOriginBenchmarkSpanVsArray() which does a timing comparison using code adapted from what you posted above. On my machine, the console output is

Array Slicing Timing: 226.3739 ms, Span Slicing Timing: 2.7299 ms

which as you predicted is a significant performance boost. Undoubtedly this is due to the lack of allocations using the Span version of the API.

Assuming this is the kind of thing you were suggesting (let me know if it's not), I think this would be a nice thing to have that I could find useful in the future and maybe would have already been useful to me if it existed in the past.

Unfortunately, based on my experience with pull request #1068, I do not believe the maintainer is currently active so those changes likely wouldn't be merged back into this fork.

Obviously there is no obligation but I am curious: if there were an actively maintained fork of this repo, would you be willing to contribute code alongside others to help add Span support to some triaged list of MathNet functions/classes?

-Mikhail

hdlee commented 1 month ago

Thanks a lot for the effort, strMikhailPotapenko. What u have done is exactly what i want. However, there're hundreds of APIs need to support the Span version(alongside array version and IEnumerable version). My idea is to provide a compatible class of IEnumerable to do the similar slice work. This can be the most effective way(just add a new class instead of providing hundreds Span version methods with cost of less efficient to Span version -- as Span is a ref struct). I'd like to contribute code, but i'm really new to Github. Please advise me how to do this.

hdlee commented 1 month ago

there's something interesting ... https://github.com/dotnet/runtime/issues/23950 this issue was opened in 2017 and no further discussion...

hdlee commented 1 month ago

hmmmm... things are not that simple... i found apis don't support IEnumerable version(i.e. Fit.LineThroughOrigin).

then the most rational way to achieve this is to modify current array version code to Span version and adding a wrapper version of array to simply invoke the span version directly. make both api public. it's a rather straight forward change but get a lot of code changes involved.

strMikhailPotapenko commented 1 month ago

First on the topic of contributing.

I recommend that you start with this section of the website as well as following links provided there for how to use Git in general. Please feel free to let me know if you have any questions or issues and I'll do my best to help you work them out. Disclaimer, I am also a little new to GitHub but have some experience with Git previously.

One potential problem is that @cdrnet hasn't been active on GitHub for some time and he is the only one that is able to merge changes back into this fork. I am currently trying to contact him to find out his intentions going forward but I am not sure if I will be successful in that. Of course, he has no obligation either way as this is a free open source project.

Some of my colleagues and I use this library in a professional setting and would like to be become more involved as well. I have been going through some discussion to see if there is value in my company devoting some resources to maintaining the library, including merging changes such as the one we are discussing into a fork that we own. This is of course assuming that cdrnet either can't be reached or indicates that he no longer wishes to be active in this repo.

In short, if you create a pull request for this issue and want to contribute code to implement, I would be happy to collaborate with you either on a separate fork or this one if it can be maintained actively.

On the technical points that you raise.

I think I understand what you mean. So in the example I gave above, we would have a wrapper function like below (ignore the name).

public static double LineThroughOriginArrayWrapper(double[] x, double[] y)
{
    return SimpleRegression.FitThroughOrigin(x.AsSpan(), y.AsSpan());
}

As you say, that would require a lot of code changes. I think if we go that direction we would need have a prioritized list of functions and go one at a time. I imagine this is more likely to be helpful for some functions than others. Also, I think we would need to review the unit test coverage first and make sure that the array based APIs that we want to change implementation for are already covered.

Can you explain in more detail what you proposed for the IEnumerable based APIs? It sounds like you are suggesting some adaptor class that bridges between Span and IEnumerable? How would this avoid copies? It doesn't look like they actually added IEnumerable support to Memory but rather added Span support to ForEach.

I think we also need to consider that although it is ugly, adding APIs with separate implementations is less risky than changing implementations of already existing APIs. Since the Span based APIs will be useful mainly for optimization it might make some sense to consider that.

-Mikhail

hdlee commented 1 month ago

Yesterday i'm trying to mockup a cs-script to do the modifications automatically(see the attachment).

however the real code situation is a bit more complicated. so the auto job runs good partially.

Can you explain in more detail what you proposed for the IEnumerable based APIs? It sounds like you are suggesting some adaptor class that bridges between Span and IEnumerable? How would this avoid copies? It doesn't look like they actually added IEnumerable support to Memory but rather added Span support to ForEach.

yes, this was my initial thought. but my POC test today tells that foreach iteration is much slower than array indexing. So nothing is a suitable replacement for Span version APIs.

As a conclusion, if we think Span version is the right thing to add. we'd better to start on a small group of APIs. the change is straightforward(make the new span version sit before/after the array version and migrate the array version call to the span version)

MathDotNetAddSpanVersionDemoWork.txt

strMikhailPotapenko commented 1 month ago

I see, thanks for the clarification.

We might be able to leverage an LLM like ChatGPT to do some of this work as well.

Do you think there is any risk to thread safety if we change the implementation of the array based APIs to Span based? Maybe there are some classes that are threadsafe now which will not be if we use a Span implementation?

Normally, we would ask the maintainer if this is something they would merge back into the repo but since they are not around, I think the best thing to do is to just create a new fork for span support and start working on it. I think the best way is to separate the addition of Span API and changing implementations of Array API into two phases so it is easier to revert back. It sounds like either way we are adding to the public functions so I would add Span API first since that is safer than changing implementations.

@jkalias since you are fairly active on this repo, what do you think about adding span support to MathNet or any approaches proposed in this conversation?

hdlee commented 1 month ago

in terms to thread safe, well, i think Span changes nothing on this aspect. Let me give u an example here.

.NET add a bunch of Span version apis as well once Span was added into .NET. below is the implementation of current Int32.Parse():

public static int Parse(string s, NumberStyles style, IFormatProvider? provider) { if (s == null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.s); return int.Parse(s.AsSpan(), style, provider); }

as u can see, it just delegate to span version. and the span version does all the real logic(i guess it inherits the code of previous array version)

we're trying to do the exact same thing.

strMikhailPotapenko commented 1 month ago

It's definitely reassuring that .NET already followed this pattern but the example you provided is a static function which will clearly not be affected.

I was more concerned about public non-static methods such as Matrix.Add()

It's not something that blocks us entirely anyway I think because a lot of methods in the library are static.

hdlee commented 1 month ago

to my understanding, static/non-static doesn't behave much different on threading-safe topics. object passes to them can be running on different thread(i might not fully understand your concern, if so, please clarify. maybe give me an example). But i'm fine to start with public static apis.

strMikhailPotapenko commented 4 weeks ago

Hello hdlee, on second thought, I think you are correct that the span implementation shouldn't change thread safety. I was thinking of cases where there are some classes with stateful properties and one tries to access from multiple threads. However, I don't think most classes in MathNet are thread safe in this way anyway and as discussed it shouldn't be a problem for static methods.

Would you like to create a fork to work on this? I did contact @cdrnet on Mastodon to see if he intends to return to the repo but haven't got a response yet. I am thinking to create a fork managed by my company and my colleagues and I are discussing this currently. If we do that I think we would like to merge these proposed changes to our fork and assist in the implementation.