uwhpsc-2016 / homework3

Homework #3
0 stars 0 forks source link

Uneven intervals Simpson's Rule #7

Open jknox13 opened 8 years ago

jknox13 commented 8 years ago

Instructor's Note: the formula immediately below is incorrect. The correct version, located in the README is sufficient for receiving points on this assignment.

I am trying to code Simpson's Rule using:

screen shot 2016-05-08 at 2 30 08 pm

from the homework description. This works fine for x with equally spaced intervals. However, it does not work for unequally spaced intervals. Everything I've seen says that Simpson's Rule does not apply to unequal subintervals because Simpson's Rule is fitting a quadratic polynomial to approximate the integral. The Trapezoidal method on the other hand easily handles unequal spacing because it fits a line to approximate the integral.

I looked at the Scipy source code and this is how it handles unequal spacing:

 else:
        # Account for possibly different spacings.
        #    Simpson's rule changes a bit.
        h = diff(x,axis=axis)
        sl0 = tupleset(all, axis, slice(start, stop, step))
        sl1 = tupleset(all, axis, slice(start+1, stop+1, step))
        h0 = h[sl0]
        h1 = h[sl1]
        hsum = h0 + h1
        hprod = h0 * h1
        h0divh1 = h0 / h1
        result = add.reduce(hsum/6.0*(y[slice0]*(2-1.0/h0divh1) +
                                              y[slice1]*hsum*hsum/hprod +
                                              y[slice2]*(2-h0divh1)),axis)
    return result

Are we actually required to come up with an analogue to this? The math looks something like this:

screen shot 2016-05-08 at 2 39 10 pm

If so, just wondering why it's not hinted at somewhere in the project description. And, obviously as this is much more computationally expensive to perform than the case with equal subintervals, should we check for subinterval equivalence in the simps_< > methods?

cswiercz commented 8 years ago

Very good observation! I admit that I overlooked the case when the dx between three successive points (one Simpson's interval) are not equal.

One one hand, it would be more interesting to implement the non-uniform formula you provided. On the other hand, some students have already written code using the "semi-correct" Simpson's rule formula. A third approach would be to only allow certain kinds of intervals. For example,

x = numpy.append(linspace(-1,0,7,endpoint=False), linspace(0,1,127))

Each sub-linspace contains an odd number of points with the same dx so in this situation the original formula works fine.

I am currently leaning toward restricting tests to this third-approach and allowing students to use the original formula. Does this sound reasonable to you?

cswiercz commented 8 years ago

Additionally, I am sure that as dx --> 0 the original formula would still lead to a decent approximation. However, one would have to do some simple numerical analysis to obtain the correct bounds on the error.

gmikel commented 8 years ago

Awwwwww man, I've already implemented the more "complicated" version...

...which brings up an interesting point. The biggest concept that I struggle with on these assignments (and I'm not sure if anyone else feels the same way), is how "fool proof" should we make our code? One of the reasons why I implemented the more complex version is because of this -- there's an infinite amount of possibilities that you (@cswiercz) and the TAs could assess us on; even though the assignments do a good job at explaining what we're being assessed on, I would rather approach the problem from a risk management perspective and implement code that, albeit slower, can handle a broader spectrum of problems. (Probably silly given the fact that this is a high-performance computing class -- just trying to maximize my points at the end of the assignment).

Sorry if this isn't the best place to put this; I just found this issue interesting because I struggled a little with this yesterday.

cswiercz commented 8 years ago

Awwwwww man, I've already implemented the more "complicated" version...

I commend you for this and am looking forward to taking a look at your code! You should do some timing comparisons and share exactly how much slower (or perhaps faster?) your general-purpose code is compared to the "naive" version of Simpson's.

...which brings up an interesting point. The biggest concept that I struggle with on these assignments (and I'm not sure if anyone else feels the same way), is how "fool proof" should we make our code? One of the reasons why I implemented the more complex version is because of this -- there's an infinite amount of possibilities that you (@cswiercz) and the TAs could assess us on; even though the assignments do a good job at explaining what we're being assessed on, I would rather approach the problem from a risk management perspective and implement code that, albeit slower, can handle a broader spectrum of problems. (Probably silly given the fact that this is a high-performance computing class -- just trying to maximize my points at the end of the assignment).

Unfortunately, due to the size of the class it's difficult for us to spend time evaluating how much effort a student has put into writing their code to make it more "fool proof" and award additional points as a result. This is part of the reason why we use automated tests. (Though, some might say that there is not such thing as "partially correct" code. It works as intended or it doesn't.) For the purposes of this class and this assignment in particular I will say, and patch the README with this information, that the worst case we'll use is

x = numpy.append(linspace(-1,0,7,endpoint=False), linspace(0,1,127))

or some variant thereof. Full disclosure, the motivation behind this choice is to have students see what sort of problems crop up when you have to access data from an array within a parallel loop. Note that we did not have to do this in the Lecture 12 example.

When it comes to life principles in programming, though, while developing code it can be a good idea to have a function catch many possible cases. The downside, as you have already pointed out, is that general code can often times be slower than specific code. We've already sort of seen an example of this: one can write a general-purpose dense linear system solver using LU-decomposition. This would still work with triangular systems but why not take advantage of the added structure and write a separate triangular system solver? But then the problem is that you cannot use the triangular system solver on general systems.

But why would you? This is where you may want to communicate such things to users of your software and that's where the documentation comes in. (See, documentation isn't just busy work!)

Additionally, you can have the function throw errors if the input is not of an expected form. However, you would need to do some work up front every time the user calls the function. With high-performance codes some authors establish the preconditions of a function call via "social contract" in the documentation rather than perform explicit checks in order to squeeze out as much performance as possible. There are pros and cons to this philosophy. An additional challenge is that, even as a developer, you cannot be 100% sure what the user is going to throw at your functions.

I'm waxing philosophical but...you brought it up.

philw16 commented 8 years ago

I guess in this case, and for all the tests in general for this homework, what are we considering to be a decent approximation? I am currently only able to get within 3 or 4 decimal places with an uneven interval (maybe because the interval size is to the small?)

gmikel commented 8 years ago

@cswiercz

Thank you for the response; that makes sense. I find the "social contract" bit very interesting, especially in the context of many of the finite element packages that I use for work. Indirectly related is the concept of "Garbage in. Garbage out." I'm sure you've heard of it but the basic notion is this: The user better have a good idea of what they're putting into the analysis package (boundary conditions; loads; properties; and what have you) or else what they get out is garbage.

The developers can't possibly account for the infinite number of possibilities that an analyst could prescribe to a system, either right or wrong, so, through documentation, the developer puts faith in the analyst that the analyst knows what he or she is doing.

cswiercz commented 8 years ago

@philw16 I guess in this case, and for all the tests in general for this homework, what are we considering to be a decent approximation? I am currently only able to get within 3 or 4 decimal places with an uneven interval (maybe because the interval size is to the small?)

When using a uniformly spaced grid with an odd number of elements (the "nice" case) I'm able to get an error of less than 1e-14 when compared with scipy.integrate.simps. When the "not nice" case is handled properly, by performing a trapezoidal rule step at the end of the domain, my error was also less than 1e-14.

This for both N small and N large.

Additionally, for a domain of the form

x = numpy.append(linspace(-1,0,4,endpoint=False), linspace(0,1,7))

I get similar result.

cswiercz commented 8 years ago

@gkelley87 I find the "social contract" bit very interesting

This dimension of software development becomes even more relevant when you introduce object-oriented programming into the mix. Take a look at Robert Martin's Design Principles and Design Strategies to see how certain structural decisions have later ramifications. (see the "circle-ellipse" problem)

the developer puts faith in the analyst that the analyst knows what he or she is doing

Big mistake. :)

In fairness, though, one of the developer's objectives should be a simple and easy-to-use interface especially when state is involved. There may be a whole network of functions being called in the background but if you can make the front-facing portion of your library consist of a few key and easy to use functions or objects then the better. Sure, you can expose some of the "lower-level" routines for those willing to read the documentation but it should come with a fair warning. (Another type of social contract.)

The PETSc library does a good job of organizing code in this way. The functions are organized by "beginner / basic usage", "intermediate" and "advanced". See the PETSc Vectors doc page, for example. Basically, the developers are saying "here be dragons, use at own risk if you don' tread the documentation".

nicksmithc102 commented 8 years ago

So, to clarify, should we write our code to accommodate unequally-spaced points, or can we assume only evenly-spaced points will be used?

philw16 commented 8 years ago

Ok so I answered my own question on the earlier questions I asked. However, now I can only get convergence on the split intervals when the first interval is of even length, which is really throwing me off. I thought odd was supposed to be the nice case? In which each interval would be handled without having two different dx values

philw16 commented 8 years ago

To clarify more:

When I say the initial interval is of even length, something like this will work:

x = numpy.append(linspace(-1, 0, 6, endpoint = False), 
    linspace(0, 1, 7))
y = sin(exp(x))

However something like this will not:

x = numpy.append(linspace(-1, 0, 7, endpoint = False), 
    linspace(0, 1, 7))
y = sin(exp(x))

Whether an even or odd number is selected for the first interval size is what determines convergence. (where convergence is virtually no error, and non convergence is about 0.0001 - 0.003 of error)

cswiercz commented 8 years ago

@nicksmithc102 Assume x-data of the form:

x = numpy.append(linspace(-1,0,4,endpoint=False), linspace(0,1,7))

That is, data where the Simpson's rule used described in the README.md is completely correct. (Each 3-point Simpson's interval fits well in the x-domain and has a uniform dx.)

The odd/even length cases refer to the fact that there aren't three points left at the very end of the entire data array. In which case, you're only able to take a "trapezoidal step".

cswiercz commented 8 years ago

@philw16 The first interval you created has an odd number of points:

>>> x = numpy.append(linspace(-1, 0, 6, endpoint = False), linspace(0, 1, 7))
>>> len(x)
13

The second interval has an even number. Although, I am just now realizing that there are still some issues with regards to using the same dx even within each of these two "partitions"... You should observe, however, that as N gets large this uneven dx issue disappears since the error is just a constant factor away from the expected error.

Testing With Uneven Intervals

I will only test on x-data where the Simpson's intervals line up such that each 3-point Simpson approximation uses the same dx. As for odd v. even, I will only check at the very end of the x-domain if a trapezoidal step needs to be taken.

philw16 commented 8 years ago

@cswiercz

The length of the overall vector has not been a problem to me. What is confusing me is that when the first step of the interval has an even number of points (the not nice case) my solution will work regardless of if the overall interval is even or odd. If the first section of the interval length is odd (the nice case), it does not work regardless of the overall length. I would expect it to be the other way around as that would give the situation where each interval of three has the same dx. My method works in non split situations regardless of what the vector length is.

Kristjansson commented 8 years ago

@philw16

I had to draw it out to prove it to myself, but in a case such as x = numpy.append(linspace(-1, 0, 4, endpoint = False), linspace(0, 1, 7)) array([-1. , -0.75 , -0.5 , -0.25 , 0. , 0.16666667, 0.33333333, 0.5 , 0.66666667, 0.83333333, 1. ]) we can see that there is an odd amount of points with the dx_1 of -1/4. This happens because we have removed the endpoint. In other words, the command linspace(-1, 0, 4, endpoint = False) creates a list of 5 evenly spaced elements from -1 to 0 inclusively, then drops the last one. Because the point dropped is the same point we start the next interval with, everything works out.

philw16 commented 8 years ago

I see it now, I should have printed the vector to look at it. That was a huge help @Kristjansson thanks

nbfueruw commented 8 years ago

Okay, I think after reading this thread a few times I'm pretty clear. But one question. While I understand why (and for me it works) this space should work: x = numpy.append(linspace(-1,0,4,endpoint=False), linspace(0,1,7))

@cswiercz says in his original response:

One one hand, it would be more interesting to implement the non-uniform formula you provided. On the other hand, some students have already written code using the "semi-correct" Simpson's rule formula. A third approach would be to only allow certain kinds of intervals. For example, x = numpy.append(linspace(-1,0,7,endpoint=False), linspace(0,1,127)) Each sub-linspace contains an odd number of points with the same dx so in this situation the original formula works fine.

And for me this x list does not work, and I feel it makes sense that it doesn't since we encounter different dxs. Is it true that our code need not give the correct result for this space? I believe this is the same situation @philw16 has been describing. So my question can be interpreted fairly specifically as: Should the space x = numpy.append(linspace(-1,0,7,endpoint=False), linspace(0,1,127)) work for our code?

Thanks!

philw16 commented 8 years ago

From my interpretation of this, it would not. From the construction of the vector (with that shared endpoint as @Kristjansson described) this would create a situation in which you have an interval with two different dx values. Which we have eliminated. I would however want to check this with @cswiercz

quantheory commented 8 years ago

@nbfueruw After confusing myself slightly trying to remember what endpoint=False does, I'm sure that the answer is no. @Kristjansson's case is the only one that needs to be handled.