rafalotufo / summbug

Bug Report Summarizer
1 stars 3 forks source link

Trying to understand different outputs for the same bug #1

Closed marquesarthur closed 3 years ago

marquesarthur commented 3 years ago

Hi,

First, I would like to thank you for having the code publicly available. This is really helpful. I am trying to extend summbug to GitHub. Thus far, I had great success and I got to run the code and produce summaries. I am just trying to understand why the code produces different outputs every time I ran it for the same bug ID.

Consider https://github.com/flutter/flutter/issues/31598 the raw text for this issue is: (1266 characters)

Canvas operations are extremely slow · Issue # 31598 · flutter/flutter · GitHub
Canvas operations are extremely slow
When I try to use a loop that draws multiple hundreds of thousand pixels using Canvas.drawRect, I get render times of multiple seconds.
Why is that and how should I draw pixels instead ?
I elaborated on this problem on StackOverflow: https://stackoverflow.com/q/55844043/6509751
I can answer it myself: Canvas.drawRect is not meant to be called this often.
I solved it using Canvas.drawImage and the BMP file format.
@creativecreatorormaybenot The solution listed in the StackOverflow answer doesn't work for realtime effects.
It needs Futures/callbacks that require at least an additional frame before the Image object can be rendered to the canvas.
Flutter needs a synchronous way to create an Image object, or a way without needing the Image class to render raw pixel data.
@daniilguit ... and no, it's not drawRawPoints, since that just draws the same paint to a number of points defined by a list of Offset.
The idea is to supply the canvas with raw pixel data, like how in the StackOverflow solution to render a BMP, but without relying on the asynchronous Image interface.
@andrewackerman I do not think that the asynchronicity of the solution I proposed is a problem, let me explain.
In my case, I can draw at 60 FPS up to a certain resolution.
I will surely have to do same optimizations going forward and I might post them here if I get reminded.
This means that it might take a frame to render the image, however, it would take one frame anyway, even if it was synchronous.
I simply convert to BMP before I trigger the paint call, which should yield the same end result as triggering the paint call and then creating the image synchronously in the paint call.
If your concern is about performance, then I am with you.
Currently, I am having memory issues in particular, however, I believe ( at least I hope ) that this is the fault of my implementation.
However, synchronicity would not solve the performance problem either as synchronous implementations would overload the device in the same way asynchronous calls would.
Maybe I misunderstood your point.
Otherwise, I could help you find an implementation that allows real-time effects, however, I should probably optimize performance first as my method does not translate well to high resolutions.
Using Canvas.drawRect to draw pixels is extremely slow
Flutter Canvas operations are extremely slow
@creativecreatorormaybenot The nature of asynchronicity in Dart means that the rendering of a frame is deferred to the next frame.
This isn't always a reasonable compromise, for example when you don't want the canvas to paint until a frame is available, or when you are running the generation on every frame possible.
At best, this requires a complicated scheduling system where the element is trying to generate a frame during the painting step of the previous frame, and the future scheduler doesn't always make it clear when that frame will be available.
This system also means that a frame will almost never be available during the first painting pass.
This is a lot of compromises and jumping through hoops that would be entirely avoided by the existence of a synchronous way to pass raw pixel data to the canvas.
To explain my use case, I am trying to make a plugin that renders noise to a canvas using the fast_noise library.
In the simplest case, I generate greyscale pixel data from a 2D/3D Simplex Noise function.
At this point, it would be ideal if I could simply pass this pixel data to the canvas for rendering.
Instead, I'm forced to use decodeImageFromPixels and store the resulting image ahead of time for the next painting pass.
It also forces me to dispose the previous image after it has been rendered.
After all is said and done, the resulting widget sees < 1 FPS.
I attribute some of this to the noise generation function ( it's probably a pipe dream to expect seamless 60 FPS performance from a software-implemented noise generator and particularly one that doesn't utilize SIMD instructions ) but a good chunk of that performance loss comes from the asynchronicity and image object generation/disposal as well.
I haven't tried your BMP solution to see if it's any different, but between its similar asynchronous nature and the fact that decodeImageFromPixels calls _ instantiateImageCodec internally, I would imagine that performance would be largely identical.
@andrewackerman I think that I was simply not aware of decodeImageFromPixels, which is why I went through implementing my own BMP encoder ( which was a pain ), however, I would suspect that the performance is very similar.
I might give it a try in the next few days and report back ( maybe update the SO answer ), however, I would say that the performance should really be exactly the same if I am not doing something horribly wrong in my implementation.
You are totally correct and I would suggest that you open an issue of this, i.e. a feature request for `` a synchronous way to pass raw pixel data to the canvas''.
It would be nice if you could reference this issue or put a link here.
@andrewackerman I am glad to be able to report back.
I tested decodeImageFromPixels and it is definitely more efficient than my implementation, however, that is probably more memory efficient ( I am still not completely sure ).
However, the performance is still not awesome as you rightfully pointed out.
Thus, I would still say that opening an issue is a good idea.
On another note, are you able to run your code on master ?
Basically, I am only able to run it on the beta channel.
@andrewackerman Did you open a ticket about pushing data directly to the canvas ?
So I can upvote it.
I've been planning on doing it, but real life stuff has been hitting me pretty hard the past few weeks.
My thought was that it should have a similar signature to decodImageFromPixels, so something like:
c is the position that the image will be drawn at.
bytes is the pixel data that will be drawn to the canvas, represented as byte groups of 4 bytes per pixel.
The color channels will be extrapolated based on the value of pixelFormat.
width and height are the dimensions of the image represented by the pixel data.
If width * height * 4 is not equal to pixelData.length, an error will be thrown.
rowBytes is the number of bytes consumed by each row of pixels in the data buffer.
If unspecified, it defaults to width multiplied by the number of bytes per pixel in the provided format.
The targetWidth and targetHeight arguments specify the size of the output image, in image pixels.
If they are not equal to the intrinsic dimensions of the image, then the image will be draw.
If exactly one of these two arguments is specified, then the aspect ratio will be maintained while forcing the image to match the other given dimension.
If neither is specified, then the image maintains its real size.
( Debatable as to whether these parameters should be supported as they don't really fall into the whole `` just draw these pixels'' mentality of this method. )
Feature Request for a Synchronous way to Draw Raw Pixel data to Canvas
If any of you think that something about it should be changed, just message me or comment there.
I will close this in favor of # 37180.
Performance Considerations for writing Dart code

A first summary has the following sentences: (240 characters, which is close to the 25% character threshold)

You are totally correct and I would suggest that you open an issue of this, i.e. a feature request for `` a synchronous way to pass raw pixel data to the canvas''.
This system also means that a frame will almost never be available during the first painting pass.
When I try to use a loop that draws multiple hundreds of thousand pixels using Canvas.drawRect, I get render times of multiple seconds.
If width * height * 4 is not equal to pixelData.length, an error will be thrown.
( Debatable as to whether these parameters should be supported as they don't really fall into the whole `` just draw these pixels'' mentality of this method. )
I attribute some of this to the noise generation function ( it's probably a pipe dream to expect seamless 60 FPS performance from a software-implemented noise generator and particularly one that doesn't utilize SIMD instructions ) but a good chunk of that performance loss comes from the asynchronicity and image object generation/disposal as well.
Flutter needs a synchronous way to create an Image object, or a way without needing the Image class to render raw pixel data.
To explain my use case, I am trying to make a plugin that renders noise to a canvas using the fast_noise library.
Currently, I am having memory issues in particular, however, I believe ( at least I hope ) that this is the fault of my implementation.

While a second summary has 270 characters:

If unspecified, it defaults to width multiplied by the number of bytes per pixel in the provided format.
I might give it a try in the next few days and report back ( maybe update the SO answer ), however, I would say that the performance should really be exactly the same if I am not doing something horribly wrong in my implementation.
The idea is to supply the canvas with raw pixel data, like how in the StackOverflow solution to render a BMP, but without relying on the asynchronous Image interface.
If width * height * 4 is not equal to pixelData.length, an error will be thrown.
This is a lot of compromises and jumping through hoops that would be entirely avoided by the existence of a synchronous way to pass raw pixel data to the canvas.
You are totally correct and I would suggest that you open an issue of this, i.e. a feature request for `` a synchronous way to pass raw pixel data to the canvas''.
If exactly one of these two arguments is specified, then the aspect ratio will be maintained while forcing the image to match the other given dimension.
@creativecreatorormaybenot The nature of asynchronicity in Dart means that the rendering of a frame is deferred to the next frame.
I attribute some of this to the noise generation function ( it's probably a pipe dream to expect seamless 60 FPS performance from a software-implemented noise generator and particularly one that doesn't utilize SIMD instructions ) but a good chunk of that performance loss comes from the asynchronicity and image object generation/disposal as well.

So my question is why does the code produce different summaries if it is fed the same sentences?

I understand that it has been 6 years since the code was published and I will continue debugging and trying to understand it on my own. Nonetheless, I wonder if I missed something obvious that could explain the diff between summary 1 and 2

Thanks once again

marquesarthur commented 3 years ago

As a follow-up, my current assumption is the Pagerank algorithm since s defines the prob of hopping to another link. Another issue might arise from while np.sum(np.abs(r-ro)) > maxerr and time.time() - start_time < 60 In a machine with low resources, the algorithm might finish before convergence.

def pageRank(G, s = .85, maxerr = .001):
    """
        Args
    ----------
    G: matrix representing state transitions
       Gij can be a boolean or non negative real number representing the
       transition weight from state i to j.
       In terms of votes or links Gij is the number of votes/links from i to j,
       therefore increasing the pagerank of node j.

    Kwargs
    ----------
    s: probability of following a transition. 1-s probability of teleporting
       to another state. Defaults to 0.85

    maxerr: if the sum of pageranks between iterations is bellow this we will
            have converged. Defaults to 0.001
    """
n = G.shape[0] ## number of nodes, since G is a square matrix

    # transform G into markov matrix M
    M = np.array(G, np.float)
    rsums = np.array(M.sum(1)) ## find sum of rows
    ci, ri = M.nonzero() ## (i,j) that are not zero
    for i in range(M.shape[1]):  ## divede rows by the row sum
        if rsums[i]:
            M[i,:] /= rsums[i] 

    # bool array of sink states
    sink = rsums==0

    start_time = time.time()
    # Compute pagerank r until we converge
    ro, r = np.zeros(n), np.ones(n) ## a vector of zeros and another of ones of size n
    while np.sum(np.abs(r-ro)) > maxerr and time.time() - start_time < 60:
        ro = r.copy()
        # calculate each pagerank at a time
        for i in range(0,n):
            # inlinks of state i
            Ii = np.array(M[:,i]) ## get each row
            # account for sink states
            Si = sink / float(n)
            # account for teleportation to state i
            Ti = np.ones(n) / float(n)

            r[i] = ro.dot( (Ii + Si)*s + Ti*(1-s) )

    if time.time() - start_time >= 120:
        logger.info('PageRank -- finished due to max allowed time')

    # return normalized pagerank
    return r/sum(r)

After increasing the time to 120, summaries are more likely to have similar sentences:

image

Please close the issue if this analysis seems right.

rafalotufo commented 3 years ago

Hi, It's been quite a while since I haven't worked on that code, but yes, the use of time in page rank is the only thing that should produce indeterministic results. I wonder what machine you are running this on or how big the bug report is. It doesn't seem like the size is the problem, so maybe it's the machine?

Cheers and good luck :)

Rafael

On Wed, Oct 14, 2020 at 1:06 PM Arthur Marques notifications@github.com wrote:

As a follow-up, my current assumption is the Pagerank algorithm since s defines the prob of hopping to another link. Another issue might arise from while np.sum(np.abs(r-ro)) > maxerr and time.time() - start_time < 60 In a machine with low resources, the algorithm might finish before convergence.

def pageRank(G, s = .85, maxerr = .001): """ Args ---------- G: matrix representing state transitions Gij can be a boolean or non negative real number representing the transition weight from state i to j. In terms of votes or links Gij is the number of votes/links from i to j, therefore increasing the pagerank of node j. Kwargs ---------- s: probability of following a transition. 1-s probability of teleporting to another state. Defaults to 0.85 maxerr: if the sum of pageranks between iterations is bellow this we will have converged. Defaults to 0.001 """n = G.shape[0] ## number of nodes, since G is a square matrix

# transform G into markov matrix M
M = np.array(G, np.float)
rsums = np.array(M.sum(1)) ## find sum of rows
ci, ri = M.nonzero() ## (i,j) that are not zero
for i in range(M.shape[1]):  ## divede rows by the row sum
    if rsums[i]:
        M[i,:] /= rsums[i]

# bool array of sink states
sink = rsums==0

start_time = time.time()
# Compute pagerank r until we converge
ro, r = np.zeros(n), np.ones(n) ## a vector of zeros and another of ones of size n
while np.sum(np.abs(r-ro)) > maxerr and time.time() - start_time < 60:
    ro = r.copy()
    # calculate each pagerank at a time
    for i in range(0,n):
        # inlinks of state i
        Ii = np.array(M[:,i]) ## get each row
        # account for sink states
        Si = sink / float(n)
        # account for teleportation to state i
        Ti = np.ones(n) / float(n)

        r[i] = ro.dot( (Ii + Si)*s + Ti*(1-s) )

if time.time() - start_time >= 120:
    logger.info('PageRank -- finished due to max allowed time')

# return normalized pagerank
return r/sum(r)

After increasing the time to 120, summaries are more likely to have similar sentences:

[image: image] https://user-images.githubusercontent.com/1497480/96021861-db7d1d80-0e04-11eb-9904-39ed3a6cf7f4.png

Please close the issue if this analysis seems right.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rafalotufo/summbug/issues/1#issuecomment-708536578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJCQUMDGMWKAS2DYJSCQLSKXLC7ANCNFSM4SQ3KB7Q .

marquesarthur commented 3 years ago

Thanks for the fast response. Yeah. It's an old macOs with 8mb + a lengthy bug report + docker eating a lot of my memory. I'll run the code at the uni server and review it.

I've been digging replication packages for a while and I really appreciate all your work/code. It was so easy to Plug'n'Play :-)