Organising Code and Increasing Efficiency

matthewb96 / FibreLengthAnalysis

Masters project to analyse the fibre lengths for glass fibres in a composite. This is done using a computer controlled microscope to take the images and image analysis tools to calculate the lengths.

0 stars 0 forks source link

Organising Code and Increasing Efficiency #3

Open matthewb96 opened 6 years ago

matthewb96 commented 6 years ago

As of commit 3d2e5a95f7e6230524627a1d78c755952262e193 all the code is written in one file, it would be good to split it up into functions in separate modules for the input of the image, the corner finding and the length finding parts. As well as this some new algorithms should be used to increase the efficiency of the checkLine() function.

Ideas to be implemented:

Splitting the code into separate modules and make sure all functions contain doc strings.
Add a boolean debug variable so some debugging code can be contained in if statements so it is not run when not debugging.
Add in line drawing for correctly found fibres, colour them red using SciKit Image draw.
Increase the efficiency of checkBlack() by checking the midpoint between the 2 positions instead of every single pixel, example image shows the idea. The number shows the order in which the positions are checked, with 0 being first.
Increase the efficiency of checkLine() by creating a boolean array to show when edges are already part of a fibre to an edge isn't checked twice. This assumes every edge is only connected to one other edge.
Add a function saving the fibre coordinates, lengths and angles to a text file.
Create a module for analyzing the data and producing graphs and histograms of the data.
Write a function that will create a random array of fibres, with known lengths and positions that can be used for testing. Also could be used to create a graph showing the scaling of the algorithms.
Redirecting standard output of python in order to store a log file of the terminal during running

matthewb96 commented 6 years ago

The code has been split up into 5 separate modules a main module, an inputs module, a corners module, a lengths module and a graphing module. Commit 37cb951bae401b99aba58ba676bc3c08ffa6df50

The main module contains any constants about the image being analysed such as minimum fibre length, as well as running all other functions and controlling the program.
The inputs module contains a function for opening the an image and converting it to grayscale openImage() as well as a funtion to generate a random image (array), that has not yet been written.
The corners module contains all the functions for Harris corner detection and then finding the midpoints of the corners as edges of the fibres.
The lengths module contains the functions that find and check the lengths of the fibres.
The graphing module currently contains nothing but will be used for all the functions to analyse the fibre lengths data and produce any graphs.

Also this commit (37cb951bae401b99aba58ba676bc3c08ffa6df50) added in a debugging variable and wrapped various bits of debugging code in if statements so it can be turned on and off as needed.

Commits 12841d972a0a28c8b9e2b234ee98e5bd97c7c5f4 and df1934f11f966f1da3a09a241a85c9befe8712c0 have organised all the code so that it works as before in the new modules. As well as this the doc strings were updated to make them more informative and some other small quality of life edits were done to the way the output files are saved. Added in a date and time in the filenames for when they were produced.

matthewb96 commented 6 years ago

The standard output of python was redirected to a log file in commit 61e830a088c39dd62fb2b275cb772a9e07333828, but this meant it did not show up in the terminal. So in commit 30e1ceb9d323402d3b87fc4adb3ee891b2667982 a new piece of code was written to allow standard output to be shown in both the terminal and the log file. Code found here. An example of the 16 Fibre Log File.

#Redirecting Standard output of python terminal to a log file
#This class will allow the stdout to be duplicated into the log file so it is also seen in the terminal
#This piece of code was found online at stackoverflow.com by Jacob Gabrielson
class Logger(object):
    def __init__(self):
        self.terminal = sys.stdout
        self.log = open(saveLocation + "(LOG).txt", "w")

    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)  

orignal = sys.stdout
sys.stdout = Logger()

Also in this commit (30e1ceb9d323402d3b87fc4adb3ee891b2667982) the inefficient code was tested for 4, 16 and 64 fibres so that the log files could be compared with the more efficient code.

In commit 21b449b7d6803c1d7ae715bb1d610819b733e5f9 the checkLine() function was rewritten to find the midpoints and check them instead of checking every single pixel. The original checkLine() function has be renamed checkLineOld() and kept for now, although is unused. Here is the new function.

def checkLine(pos1, pos2, fibreWidth, imageArray):
    """
    This function is a rewrite of checkLineOld() and uses a more efficient algorithm. This algorithm works be finding the midpoint of the two fibre ends and checking it is black,
    then finding the next midpoint and checking that. It does this until the midpoint is less than the width of a fibre.

    arg[0] pos1 - numpy array containing the first set of coordinates.
    arg[1] pos2 - numpy array containing the second set of coordinates.
    arg[2] fibreWidth - int value for the width of a fibre
    arg[3] imageArray - numpy array containing the image data that should be checked.

    Returns boolean value True if the line is part of a fibre and False if not.
    """
    mid1 = midpoint(pos1, pos2)
    mid2 = midpoint(pos1, pos2)
    while int(coordDist(mid1, pos1)) > fibreWidth and int(coordDist(mid2, pos2)) > fibreWidth:
        #Check if the midpoints are part of a fibre
        if checkBlack(mid1, imageArray) or checkBlack(mid2, imageArray):
            #Find the new midpoints
            mid1 = midpoint(pos1, mid1)
            mid2 = midpoint(pos2, mid2)
        else:
            return False

    #If the loop has suceeded then pos1 and pos2 are part of the same fibre
    return True

Another function midpoint() is added and this is very simple and just finds the midpoint between two sets of coordinates.

The program was then tested with the 4, 16 and 64 fibre images to see the improvement, this table shows the results.

Function	4 Fibres Image	16 Fibres Image	64 Fibres Image
Old Function Time Taken (s)	13	250	6700
New Function Time Taken (s)	0.6	12	650
Old Function Lengths checked	18	278	4296
New Function Lengths checked	18	278	4296

This new algorithm reduces the time taken to check each line dramatically, because it doesn't need to check as many pixels. The next step is to make the program more efficient by reducing the number of lengths that need to be checked.

Also in commit 21b449b7d6803c1d7ae715bb1d610819b733e5f9 a while loop was added to main.py to check if the filename given exists, before attempting to analyse it, and the numpy array containing the fibre lengths and coordinates is saved to a text file. Example of the 16 fibres array.

matthewb96 commented 6 years ago

In commit 835838588e8b7d1be422328ee26ac92ba01147c1 the number of lines that would need to be checked was reduced by adding a simple check that wouldn't allow a coordinate to be checked again once it was already found to be part of a fibre.

This was done by creating a boolean array with the same length as the coordinates array and setting every value to False, once a fibre was found both coordinates corresponding to that fibre had their values in the array set to true. At the start of both for loops there is an if statement that checks if the corresponding value in the boolean array is true and if so this loop is skipped.

By doing this the number of lines being checked was reduce significantly and therefore the time taken for the program to run was also reduced.

Function	4 Fibres Image	16 Fibres Image	64 Fibres Image
Previous Time Taken (s)	0.6	12	650
New Time Taken (s)	0.5	4	62
Previous Lengths checked	18	278	4296
New Lengths checked	7	46	328

The combined affect of both these methods of increasing efficiency of the algorithms has resulted in the program running much faster.

Completed ideas, from above issue:

Splitting the code into modules
Adding boolean debug variable
Increased efficiency of checkLine() by checking the midpoints
Increased efficiency of findLengths() by using a boolean array to make sure a coordinate is checked again once it is already part of a fibre
Saved the fibre coordinates and lengths to text file
Redirected standard output of python to a log file

Tasks that still need completing:

Add in line drawing for correctly found fibres, colour them in using SciKit Image draw
Find the fibre angles and save it to the file
Write a function that will generate random image arrays with known lengths and positions for testing
Add data analysis and create graphs in the graphing module

matthewb96 commented 6 years ago

Found the angle for each fibre and added that into the fibresLength array so it is also saved into the text file, commit 931e521791b60b9e78f4031488e33ea1aa6a7645. 4 test fibres data including angle (final column) the data shows the correct angles for each of the fibres except that it shows the same angle (45deg) for the bottom two fibres when they are actually 45 and -45.

In commit f960af3b31cb727d731a14e676c80480afbb8a8c the generateImage() function was written, that will generate an image array of a set size containing a set number of fibres, with a minimum length and a constant width. All the fibres have random positions and random orientations but will not leave the boundries of the image array, they do not considered any other fibre positions when generating so they can cross. This image shows an example of the randomly generated image. This is an the corner positions on that image.

This shows that when fibres are crossing more corners are found at the crossing position so something will have to be added to findCorners() in order to account for this and ignore these corners. In order to check that fibres that have been found some code to add colour to the found fibres is added, but is currently not working properly.

image = cv2.cvtColor(imageGray, cv2.COLOR_GRAY2BGR)
for i in range(fibreLengths.shape[0]):
    print("Drawing " + str(i) + " out of " + str(fibreLengths.shape[0]))
    lineCoords = int(np.rint(fibreLengths[i])) 
    print("Line coords: " + str(lineCoords))
    rr, cc = draw.line(lineCoords[0], lineCoords[2], lineCoords[1], lineCoords[3])
    print("rr " + str(rr) + " cc" + str(cc))
    image[rr, cc] = np.array([255, 0, 0])

cv2.imwrite(saveLocation + "Drawn Lines.jpg", image)

This log file shows that the code works until the end of print("Drawing " + str(i) + " out of " + str(fibreLengths.shape[0])) but after this the program stops working but produces no error.

To do:

Finish off generateImage() function by adding in the output of the fibre positions, lengths and angles so they can be checked against the values found, this would allow this function to be used to test the program.
Fix the algorithm to add colour to the fibres that have been found in order to see more clearly where the function is struggling
Possibly fix the algorithm finding the fibre angles so that 45deg and -45deg can be distinguished
Add data analysis and graphing functions into the graphing module

matthewb96 commented 6 years ago

Commit 7214858e3a39910942036c0e85dbf072eba2972d orgranise main.py so the bulk of the code was contained in while loops so that multiple images could be ran one after another, also edited the input so "Random 4" could be typed to get the program to generate and analyse 4 random images. Here is the code.

#Get input file
while True:
    imageSource = input("Please input filename to be analysed, input \"Random #\" to generate and analyse # random images (case sensitive): ")
    if imageSource.find("Random") != -1:
        RANDOM = True
        try:
            rand, numRand = imageSource.lower().split(" ")
            numAnalyse = int(numRand) #Number of images to be analysed
        except:
            print("The format you have given is incorrect please try again. \n If you would like random images type \"random #\" (case sensitive)")
            continue
        print("Generating and analysing " + str(numRand) + " random images.")
        imageSource = "Generated Random Image."
        break
    elif not os.path.isfile(IMAGEFOLDER + imageSource):
        print("Could not find \"" + imageSource + "\" in " + IMAGEFOLDER + "\nPlease try again.")
        continue
    else:
        RANDOM = False
        numAnalyse = 1 #Number of images to be analysed
        print("Analysing " + IMAGEFOLDER + imageSource)
        break

The above code will get the input from the user and analyse it to check the input is correct and check what is wanted, the loop will allow it to repeatedly ask the user for input if there is anything incorrect. This also allows for randomly generated images to be created and analysed. This next bit of code loops through the image analysis and generation so multiple images can be analysed, currently this only works with multiple random images still only one input image can be analysed at once.

#Loop to allow mulitple images to be analysed at without extra input
numDone = 1
originalSaveLoc = saveLocation #Keep the unedited saveLocation 
while numDone <= numAnalyse:
    #Create the grayscale numpy array of the image or generate an image array
    print("\n\n***********************************************************************************\nStarting image " + 
          str(numDone) + " out of " + str(numAnalyse))
    if RANDOM:
        imageGray = inputs.generateImage(FIBRE_WIDTH, MIN_LENGTH, 10, 1000)
        saveLocation = originalSaveLoc + " (Random Image " + str(numDone) + ") "
        print("Generated random image " + str(numDone) + " out of " + str(numAnalyse))
    else:
        imageGray = inputs.openImage(IMAGEFOLDER + imageSource, DEBUGGING, saveLocation)
        print("Opened image " + str(numDone) + " out of " + str(numAnalyse))

    #Find the corners and then the edges on the image
    cornersCoords = corners.findCorners(imageGray, saveLocation, DEBUGGING)
    edgeCoords = corners.averageEdges(cornersCoords, FIBRE_WIDTH)

    #Finding the fibre lengths
    fibreLengths = lengths.findLengths(edgeCoords, MIN_LENGTH, FIBRE_WIDTH, imageGray)
    np.savetxt(saveLocation + "Fibre_Lengths.txt", fibreLengths, header = "Fibre lengths: [x0, y0, x1, y1, length01, angle01]")

    #Draw found fibres

    #Update number done
    print("Analysed image " + str(numDone) + " out of " + str(numAnalyse) + 
          "\n***********************************************************************************")
numDone += 1

As well as this some edits to inputs.generateImage() were made so that maximum fibre length is independent of array size and the image array generated was converted to 8 bit so it could be used by openCV later in the program. Also created a drawFound() function that would draw lines on the fibres that had been found.

matthewb96 commented 6 years ago

Commit 478e9e60c924523b68adbf8cd80a79ab1767a73f fixes the drawFound() function in lengths. The error in this function was caused by converting the numpy array to int using pythons in built function. lineCoords = int(np.rint(fibreLengths[i])) This error was not found when running through the full program so some test code was added to lengths.py so that this module could just be ran so errors could be seen. Test code:

import inputs

imageGray = inputs.openImage("..\\FibreImages\\4 test fibres (25,500 fibre).jpg", False, "testing draw")
fibreLengths = np.loadtxt("..\\ProcessedData\\4 test fibres (25,500 fibre)[2018-02-10_15-12-53]Fibre_Lengths.txt", skiprows = 1)
drawFound(fibreLengths, imageGray, "Drawn fibres test.jpg")

Once this error was found the function was changed to use numpy to convert the array to int.

    #Draw on the found fibres
    image = cv2.cvtColor(imageArray, cv2.COLOR_GRAY2BGR)
    for i in range(fibreLengths.shape[0]):
        print("Drawing " + str(i) + " out of " + str(fibreLengths.shape[0]))
        lineCoords = np.rint(fibreLengths[i])
        lineCoords = lineCoords.astype(int)
        print("Line coords: " + str((lineCoords[1], lineCoords[0], lineCoords[3], lineCoords[2])))
        rr, cc = draw.line(lineCoords[1], lineCoords[0], lineCoords[3], lineCoords[2])
        image[rr, cc] = np.array([0, 0, 255])

    cv2.imwrite(filename + "Drawn Lines.jpg", image)
    print("Drawn found fibres on the image: " + filename + "Drawn Lines.jpg")
return

This code was then tested using the above test code and correctly worked, here is an example image. This shows red lines drawn wherever the program has determined is a fibre and will make it much easier to determine whether the program has successfully found fibres.

In commit 60cb742d2f783f35aa12f7d3f1b3f15418523f7c the excess test code from lengths.py was removed and drawFound() was called in main.py so the whole program could be tested.

Now left to do:

Finish off generateImage() function by adding in the output of the fibre positions, lengths and angles so they can be checked against the values found, this would allow this function to be used to test the program.
Add data analysis and graphing functions into the graphing module

The angles do not need to be fixed so 45deg and -45deg can be distinguished as the fibres being analysed have been dispersed randomly during the sample making process so the angles are arbitrary.

matthewb96 commented 6 years ago

Commit 73fc1ba1139e1334539d3685e528a1efda8c939a tested generating and analysing some random images and struggled to find the corners on some images, but not all. I thought the issue could be due to the jagged edges on the fibres because the draw.polygon() function did not have anti-aliasing.

When trying to fix this issue in d57f2415a583f55091e1d7031372440b1075f040 I tried using line_aa() to draw the edges of the fibres but that did and there was no anti-aliasing polygon function in skimage.draw. When trying to do this the lengths array created in inputs.trig() was found to be a 2D array so this was changed so it was only a 1D array as it was causing the corners to be 2D arrays too. Also all the corners were rounded to the nearest int as the lengths that were added to find each corner are floats.

The program was then tested with 10 random lengths, then again with 20 and for all of these tests the correct number of corners and edges were found and for all but one the correct lengths were found. For "Generated Random Image[2018-02-10_17-29-23] (Random Image 17)" one fibre length was not found, but all the corners had been found for that fibre. The problem that caused this I have suggested in the commit description is that the endpoints were not exactly on the fibre maybe one pixel away so they were white, however this cannot be correct because the lengths.checkLine() function does not check that the two endpoints are black it only checks the midpoints. This suggests my first assumption is incorrect but it is difficult to see.

In order to make debugging easier showing the midpoints positions on the subpix image would be good.

matthewb96 commented 6 years ago

Commit 583261430863beae67fc7dbb17e5f8989da8984f saving of the image with the centroids and subpixel corners on it was removed from corners.findCorners() and instead that function would return the image array with the corners added. Then outside of that function the edge positions would be added to the image and then it would be saved. This image would have the centroid, subpixel corners and the edge positions on to make debugging easier.

Commit 8468c739ba6853a00b46a9f9031cbeaf0171f451 edited inputs.generateImage() to return a numpy array containing all the fibre position data, which could be used to check if the program is working correctly.

                #Find midpoint for fibre data
                pos1 = midpoint(corner1, corner3)
                pos2 = midpoint(corner2, corner4)
                #Add generated fibre position data
                arrayRow = np.array([pos1[0], pos1[1], pos2[0], pos2[1], length, angle])
                fibrePositions = np.vstack((fibrePositions, arrayRow))
                print("Generated " + str(fibresGen) + " out of " + str(numFibres) + " fibres.")

    fibrePositions = np.delete(fibrePositions, 0, 0)          
    return imageArray, fibrePositions

Also the function checkRandom() was created in main.py that would check the fibre lengths found against the known positions and return the number of fibres that were correct, incorrect and only one pixel away. This function also printed out the values that were correct and incorrect so that the log file could be checked to find out where the program struggled. checkRandom() function

Commit 4c0745b80ec5ee609e22da5f1cd730a6fe126407 split the processed into two separate folders for data and images, and also added a gitignore file so no images were uploaded to github.

matthewb96 commented 6 years ago

In commit 3ae419aabb88bb9ee7ba3a7b0e7049a0c94eaae2 the program was tested on 1000 randomly generated 10 fibre images to see how sucessfully it found fibres. Log File for this data. In this log file it shows that a total of 6998 correct fibres have been found, with 2594 fibres only one away. 329 have been incorrectly found. This is quite good data but the images generated are 10 fibres between 100 and 1000 pixels with a total image size of 10000*10000 pixels, this is so that it is very unlikely that fibres cross.

The find corners function needs to be edited to try and remove finding extra corners where fibres cross
Testing on random images with crossing fibres needs to be done
Data analysis and graph creation need to be added to graphing module

matthewb96 commented 6 years ago

Commit b2f9540852b9f16cf3771b3569eb6d19a9ef0179 added a text file containing any incorrect fibre messages from the log file to make errors easier to see, also changed save location. In commit 2a975b191d42057b275c5ba8967e5340b425ea93 the Logger class was edited to fix the output of standard error (errors were not showing on the terminal).

class Logger(object):
    def __init__(self, standard):
        self.type = standard
        self.terminal = sys.stdout
        if self.type == "Out":
            self.log = open(PROCESSEDDATA + saveName + "(LOG).txt", "w")

    def write(self, message):
        self.terminal.write(message)
        if self.type == "Out":
            self.log.write(message)

    def flush(self):
        self.terminal.flush()
        if self.type == "Out":
            self.log.flush()

orignalOut = sys.stdout
originalErr = sys.stderr
sys.stdout = Logger("Out")
sys.stderr = Logger("Error")

Some tests were ran on different array sizes (arrays are all squares) between 1000 and 10000 for 100 random images each size to determine how errors increased with smaller arrays (smaller array size = more crossing fibres). The first set of tests were ran (commit 98733522ecad0846a586ef87be6e73a69663eada) but during this test it was noticed that when checking the fibres are one away the absolute value was not used so any values where the found length is longer than the known length would result it being one away, this was fixed after all the data was collected so these tests need to be redone.

if fibreLengths[i, 4] == knownPositions[i, 4]:
                 print("CORRECT: " + str(fibreLengths[i]))
                 correct += 1
-            elif fibreLengths[i, 4] - knownPositions[i, 4] <= 1:
+            elif abs(fibreLengths[i, 4] - knownPositions[i, 4]) <= 1:
                 print("One away: "+ str(knownPositions[i]) + " Found Data: " + str(fibreLengths[i]))
                 oneAway += 1

A new python file was used to create the graphs for this data, manualGraphs.py. This file is not part of the main program and is only used manually for producing graphs (commit 7b655dad548e4d075927f6676ce7c26e457790a2). Also in this commit checkRandom() was edited to add 1 to the incorrect value each time a fibre wasn't found, to make sure incorrect value was correct.

In commit f38ca3cd0b022d548d2ad611182456089ba17697, the tests from above were redone so the data could be seen with the errors explained above removed, a graph of this data was then plotted. Here are the two graphs for before and after the fixes.

graph showing the data for 100 random images before fixing the one away check

When looking at the incorrect fibres, if there were less than 10 fibres found when comparing them to the known data they could be incorrectly checked against the wrong fibre due to some being missing. In order to fix this a for loop was added (commit 80d7d246c264fa45e88c635568793cc3451e1434). This addition would check how many fibres were missing and would loop through that amount of fibres in the sequence until the correct one was found. This would give a more accurate view of the correct and incorrect fibres. The test from above was then re-run and another graph was plotted (commit db0169178ef37821df287bac1f5caa3ecac20dc9).

All three graphs show an approximately exponential decrease in incorrect fibres with array size, showing the the smaller array size and therefore more crossing fibres causes more fibres to be incorrectly found or not found at all. Therefore some work needs to be done to correctly find crossing fibres more consistently.