matthewb96 / FibreLengthAnalysis

Masters project to analyse the fibre lengths for glass fibres in a composite. This is done using a computer controlled microscope to take the images and image analysis tools to calculate the lengths.
0 stars 0 forks source link

Fibre Endpoints and Length #2

Open matthewb96 opened 6 years ago

matthewb96 commented 6 years ago

Now the corners have been found #1, any corners close enough to be considered part of a fibre end are averaged in order to find a midpoint for each fibre end. This is done by first writing a function that takes two sets of coordinates and finds the length between them using Pythagoras' theorem.

def coordDist(pos1, pos2):
    """
    Takes two arrays of length 2 and will find the length between them.
    """
    xDiff, yDiff = abs(pos1 - pos2)
    hippotenuse = np.sqrt(xDiff**2 + yDiff**2)
    return hippotenuse

Another function then compares then lengths between every corner to see which ones are close enough, whichever are close enough are averaged. An array of all these midpoints is created.

def averageEdges(cornerPos, FIBREWIDTH = 25):
    """
    Finds the corners that are close enough together to be considered part of the short edge 
    and averages them to return one position for each edge.
    """
    averageCoords = np.array([[0,0]]) #Initiate array with temp values
    arrayLength, width = cornerPos.shape #Find length of axis 0
    for i in range(arrayLength):
        for j in range(i, arrayLength): #Generates list of numbers starts at i so to not repeat numbers already compared
            distance = coordDist(cornerPos[i], cornerPos[j])
            if  distance <= FIBREWIDTH and distance != 0:
                average = np.array([(cornerPos[i] + cornerPos[j])/2])
                averageCoords = np.vstack((averageCoords, average))

    averageCoords = np.delete(averageCoords, 0, 0) #Delete temp values
    return averageCoords

This array can then be used to calculate the lengths of each fibre, to test this an image with one fibre like object was used. So some temporary code was used to calculate the length.

#Find fibre length
pos1, pos2 = averageEdges(cornersSubPix)
length = coordDist(pos1, pos2)
print("Fibre Length: " + str(round(length, 0)))

This found an value of 500pixels (when rounded to nearest pixel), it found the correct length for the orignal(vertical) image and offset images of -45deg, 45deg, 90deg, 30deg and 60deg. For each image there were no false corners seen, only four corners were seen for each and they all looked to be in the correct positions. Commit https://github.com/matthewb96/FibreLengthAnalysis/commit/510d3df3af89c230fd5b472ef576cc910786d49e

The next step is to generalise the code to calculate the lengths, so that the length of any number of fibres can be calculated. Once that is done some code to test that when a length is measured between two endpoints it corresponds to a fibre and not to two unconnected endpoints, a possible solution for this is to check there is are solid black pixels between the two positions. Hough line transform might be a way to do this.

One point to note about the above function averageEdges() is that it returns an array of midpoints of corners and it assumes a corner will only be close enough to one other corner. In reality some corners could be close enough to more that one other corner so some additions will need to be made to ensure the correct midpoint is being measured.

matthewb96 commented 6 years ago

Rewritten how the length of the fibre is calculated to be more general. The function shown below has been added.

def findLengths(coords, minLength = 25):
    """
    Accepts a 2D numpy array of coordinates and finds the length between all combinations of coordinates that have a line between them.

    Returns array containing the coordinates of the line and the line length 
    [[x0, y0, x1, y1, length01]
     [x0, y0, x2, y2, length02] 
     ...]
    """
    lineLengths = np.array([[0, 0, 0, 0, 0]]) #For each line in the array [x, y, x1, y1, length]
    arrayLength, width = coords.shape #Find length of axis 0
    for i in range(arrayLength):
        for j in range(i, arrayLength): #Generates list of numbers starts at i so to not repeat numbers already compared
            if not checkLine(coords[i], coords[j]):
                continue #Skip loop if corners not joined by solid black pixels ie not part of the same fibre
            distance = coordDist(coords[i], coords[j])
            if distance == 0:
                continue #Skip this loop if the distance is zero ie same corners are being measured
            arrayRow = np.array([coords[i][0], coords[i][1], coords[j][0], coords[j][1], distance])
            lineLengths = np.vstack((lineLengths, arrayRow))

    lineLengths = np.delete(lineLengths, 0, 0)
    return lineLengths

The function above is given an array and a minimum fibre length (currently not used) and will return an array containing the lengths between all the coordinates. The return array will have the following format where 0 and 1 are the coordinates of the first position, 2 and 3 are the coordinates of the second position and 4 is the length between them. image The function skips any distance values of exactly 0 as these will be from the same position with itself, it also avoids checking the same two positions twice when looping through both arrays. This function calls checkLine() which will check if the two positions are connected by a solid line and therefore are on the same fibre, however it has not yet been written so currently it just returns true. Commit, https://github.com/matthewb96/FibreLengthAnalysis/commit/65663fa0698a7779ac500b40a1a9027b4ee981f0 , tested this function on the single fibre image and it worked correctly, the above image is from this test.

matthewb96 commented 6 years ago

Commit, https://github.com/matthewb96/FibreLengthAnalysis/commit/7c7e4ba5ff3373b762fd3952bc31ef8525c2d7c9 , removes the additional centroid that is found for the average position of all the zeros in the array. This is discussed here https://github.com/matthewb96/FibreLengthAnalysis/issues/1#issuecomment-345560731 . Therefore the only centroids now refer to corners, so no confusion can be had with this extra position.

matthewb96 commented 6 years ago

Now that the code had been generalised to measure the lengths in a whole array of positions tests with multiple fibres could be ran. Commit https://github.com/matthewb96/FibreLengthAnalysis/commit/9e30da78dd07e402c4346687a56b2ffa3530d4cd . In these tests the original fibre image was copied multiple times and rotated to form images containing 4, 8, 16 and 64 fibres. The program was then run on each of these images and for each one only half the corners were found. The array lengths were printed to help with this and this is an example of the array lengths for the single fibre. image

Here is the 16 fibre image, all the images are just 4 copies of the previous image stitched together. 16 test fibres 25 500 fibre For this image 32 corners, 16 endpoints and 120 lengths were found. Lengths can be ignored because they are not verified. But as shown by corners and endpoints only half the fibres are found. Below is the image with the corner positions plotted and when zoomed in you can see that the corners have been found fine for all the vertical and horizontal lines but none of the lines at 45 or 135 degrees. 16 test fibres 25 500 fibre 2 jpg subpix However when you look at the images generated by cornerHarris() it has found all the corners of each fibre but the centroids for all are not found. Therefore there is probably an error in this line.

retval, labels, stats, centroids = cv2.connectedComponentsWithStats(cornersThres)

Image produced by cornerHarris() showing all the corners. 16 test fibres 25 500 fibre 2 jpg harris

The next step is to figure out the error in connectedComponentsWithStats(), which could possibly be caused by the excess white pixels produced by cornerHarris() on the angled fibres. Once this can be solved, so that all corners can be found, checkLines() can be written to check if two positions are part of a single fibre so that only actual fibre lengths are calculated.

matthewb96 commented 6 years ago

To figure out why the corners were not showing up for the diagonal fibres tests of only 45 degree fibres were ran (commit https://github.com/matthewb96/FibreLengthAnalysis/commit/67ada77bad52a306b6a9ad64adedd65ce4f0a125).

All the below tests were run with 25*500 fibres at 45degrees, the number of fibres and the image size was varied.

These tests suggest that connectedComponentsWithStats() struggles with large images containing multiple diagonal lines.

The issue causing this needs to be found so the next step is debugging the large image (8000*8000) containing 2 diagonal fibres.

matthewb96 commented 6 years ago

The large image (8000 8000) with 2 fibres at 45 degrees (each being 25500 pixels) was debugged (commit https://github.com/matthewb96/FibreLengthAnalysis/commit/62df8eabe2fb3f32b5f39eac2b7e751de7dd3f04). Using the Spyder variable explorer the corners and cornersThres arrays could be viewed. corners is created by cornerHarris() and cornersThres is created with threshold(corners, 0.9*corners.max(), 255, 0). When looking at the two arrays the corner positions were found in corners but they were not found in cornersThres both shown below. corners array cornersArray cornersThres array cornersThresArray Original Image at the same position originalImage

This shows that the issue is with threshold() not connectedComponentsWithStats() as previously assumed. The threshold value was reduced from 0.9*corners.max() to 0.8*corners.max() where the corner positions were found, shown below. cornersThres array cornersThresArray2

The issue was fixed for the 2 fibres in the large image so next it was tested with the 64 fibre image (80008000) that did not work previously (https://github.com/matthewb96/FibreLengthAnalysis/issues/2#issuecomment-346974542). This had the same issue as before only finding the horizontal and vertical fibre's corners, in order to make debugging easier the cornersThres array was saved to an image. The threshold value was reduced to `0.75corners.max()which was still not low enough, but0.7*corners.max()was low enough to find all corners and endpoints, and find the correct lengths, albeit still finding lengths between endpoints that were not connected ascheckLines()` is not yet implemented. Thresholded image 64 test fibres (25,500 fibre) [10].jpg HarrisThres Corner Sub Pixel image 64 test fibres (25,500 fibre) [10].jpg subpix

This gets the correct corners for all the fibres in the image and should not matter how many or what orientation they are in, although if the image size increases more then the threshold value may need to be reduced more. Will have to check to see if the threshold value being used now will be okay for larger numbers of fibres and larger image sizes. The next step is to write checkLines() so that only the endpoints corresponding to a fibre have the lengths measured, and also to test the program on real fibre images.

matthewb96 commented 6 years ago

The corners are correctly being found and used to find midpoints of each edge. These length between each of these edges is calculated but now checkLines() is written in order to only measure the length between two points that are connected by a fibre. In commit https://github.com/matthewb96/FibreLengthAnalysis/commit/e778c8446e5499ac4c977d4a6762b5cd2aef890f the function is written to calculate the gradient between two points and loop through each pixel connecting them checking if the pixel is black (ie part of a fibre), with checkBlack() which has not been written yet. The first part of the function calculates the equation of the line and checks it is correct.

     gradient = (pos1[1]-pos2[1])/(pos1[0]-pos2[0])
     yIntercept = pos1[1] - (gradient*pos1[0])
     if yIntercept != pos2[1] - (gradient*pos2[0]):
         print("Cannot find the equation for the line between " + str(pos1) + " and " + str(pos2))
         return False

Then the second part is two loops that will loop through the x positions and y positions to check every pixel between the two points.

    for i in range(min(pos1[0], pos2[0]), max(pos1[0], pos2[0]) + 1):
         yVal = (gradient * i) + yIntercept
         pos = np.array([i, yVal])
         if not checkBlack(pos):
             return False
     #Next loop through the y positions calculating corresponding x
     for i in range(min(pos1[1], pos2[1]), max(pos1[1], pos2[1]) + 1):
         xVal = (i - yIntercept)/gradient
         pos = np.array([xVal, i])
         if not checkBlack(pos):
             return False
     #If both loops make it all the way through then the postions must be connected by a fibre so return True
      return True

In commit a2fc5a3ce376abbcd47fc3e6581c770f24fc2333 checkBlack() is written to check if a given set of coordinates corresponds to a pixel that is nearly black, and thus part of a fibre.

    pixel = imageGray[int(pos[1])][int(pos[0])]
    if pixel <= (0.1*np.amax(imageGray)):
        return True 
matthewb96 commented 6 years ago

When testing the checkLine() function, with the 4 fibres image at different angles, it only finds the lengths for two of the fibres (the horizontal and vertical ones). The function needs to be rewritten with some additional checks to fix some of the issues. Commit a2fc5a3ce376abbcd47fc3e6581c770f24fc2333 The first is to round the positions to the nearest whole number to make comparing them better. Also some boolean variables are added to allow for the for loops to be ignored if they are not needed, when the lines are straight so the x or y value is constant.

#Create two booleans to allow for a for either for loop to be ignored
horizontal = False
vertical = False
#Round the coords to integers
pos1 = np.rint(pos1)
pos2 = np.rint(pos2)

Next an if statement is added to check if some of the code can be ignored, by checking if the lines are perfectly straight or if both positions are the same.

    if pos1[0] == pos2[0] and pos1[1] == pos2[1] : #Same corner
          return False
    elif pos1[0] == pos2[0]: #Completely vertical line 
         xVal = pos1[0]
         vertical = True
     elif pos1[1] == pos2[1]: #Completely horizontal line
         gradient = 0
         yIntercept = pos1[1]
         horizontal = True
     else: #The line has a gradient
         gradient = (pos1[1]-pos2[1])/(pos1[0]-pos2[0])
         yIntercept = pos1[1] - (gradient*pos1[0])
         if yIntercept != pos2[1] - (gradient*pos2[0]):
             print("Cannot find the equation for the line between " + str(pos1) + " and " + str(pos2))
             return False

If the line is vertical then all x values will be the same so there is no need to loop through x positions and the same can be said for y if the line is horizontal, this should make the program slightly more efficient. However this did not fix the issue of fibres at an angle not working so testing it with a simple image of two fibres at an angle of 45degrees was done, commit ab5816f1188a0b9d62345890f141a5e522f2674e .

The first issue was the blurring at the edge of the diagonal lines this lead to a pixel value of 118 on the first position so the checkBlack() function would not consider it part of a line and would return False so the line would be ignored. This was solved by increasing the values at which it was considered a fibre, from 10% to 50% of the maximum value in the image 0.5*np.amax(imageGray). This found the length of one of the fibres.

The program was printing that it couldn't find the equation for one of the set of coodinates and when checking this it was due to a rounding error causing the if statement that checks if the equation of the line is correct to return False. This was solved by rounding the values to the nearest integer before comparing them.

if np.rint(yIntercept) != np.rint(pos2[1] - (gradient*pos2[0])):
       print("Cannot find the equation for the line between " + str(pos1) + " and " + str(pos2))
       return False

This solved the issue so the lengths for both fibres was found.

Testing was then done with the 4 fibres images and the 16 fibres image, commit ae1a8c2da36a27254222399fc6cf88e230aabd46 . For both images it found the correct number of lengths corresponding to all the fibres in the image, although it did take a significant amount of time for the 16 fibres ~6minutes. Which was 24 times longer than the 4 fibres meaning that 64 fibres will probably take a minimum of 2hrs, so the code needs to be made more efficient.

Commit ae0dabff9773f425306ac9c5007357aeaa7a74dd added some simple terminal print outs that showed the progress of the program as well as some timings and in commit 3d2e5a95f7e6230524627a1d78c755952262e193 the code was made a lot more efficient before testing on the 64 fibres image. This was done by adding a break statement at the end of the for loop so that once an edge had connected to another edge and a length was found between them the loop would move onto the next edge. Instead of how it previously would continue checking that edge with every other edge. This reduces the lengths that needed to be checked to ~4500 out of 8100 possible lengths for the 64 fibres image. However this still took ~1hr 45mins to complete so more efficient methods will have to be looked at.

One simple change could be to check the pixel every 10 pixels instead of every single pixels, although this could allow for false positives when finding fibres. This will have to be tested. The next step is to do some tests with real data to find any errors caused by noise or by contaminants on the sample that are not fibres.