joelgrus / data-science-from-scratch

code for Data Science From Scratch book
MIT License
8.71k stars 4.52k forks source link

K Nearest Neighbors code missing #80

Closed smithjackson35 closed 4 years ago

smithjackson35 commented 5 years ago

https://github.com/joelgrus/data-science-from-scratch/blob/master/scratch/nearest_neighbors.py

Nothing posted here for the code

smithjackson35 commented 5 years ago

ch12_error.txt

List index out of range error at stopping point in code

smart-patrol commented 4 years ago

I am getting the same error.

In [3]: %run nearest_neighbors.py                                                          
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/Desktop/DSFromScratch/Chapter12b.py in <module>
    148                      for min_dist, avg_dist in zip(min_distances, avg_distances)]
    149 
--> 150 if __name__ == "__main__": main()

~/Desktop/DSFromScratch/Chapter12b.py in main()
     73     with open('iris.dat') as f:
     74         reader = csv.reader(f)
---> 75         iris_data = [parse_iris_row(row) for row in reader]
     76 
     77     # We'll also group just the points by species/label so we can plot them.

~/Desktop/DSFromScratch/Chapter12b.py in <listcomp>(.0)
     73     with open('iris.dat') as f:
     74         reader = csv.reader(f)
---> 75         iris_data = [parse_iris_row(row) for row in reader]
     76 
     77     # We'll also group just the points by species/label so we can plot them.

~/Desktop/DSFromScratch/Chapter12b.py in parse_iris_row(row)
     67         measurements = [float(value) for value in row[:-1]]
     68         # class is e.g. "Iris-virginica"; we just want "virginica"
---> 69         label = row[-1].split("-")[-1]
     70 
     71         return LabeledPoint(measurements, label)

IndexError: list index out of range

@joelgrus It looks like both the .py file and the printed book example are un-runnable due to this.

Also looks like the Iris data set name change from iris.data to iris.dat

joelgrus commented 4 years ago

ugh, that "list index out of range" seems to be caused by the extra blank line at the end of the iris.data file, if you delete that blank line it seems to work. sorry.

smart-patrol commented 4 years ago

The data changed, which the book does warn about repeatedly. Knew my idol's code could not be wrong !

@smithjackson35 There's probably a better solution but this fixed it programmatically in python:

    # set data to proper length:
    with open('iris.dat', 'r') as f:
        lines = f.readlines()
    with open('iris.dat', 'w') as f:
        for line in lines:
            if len(line) > 2:
                f.write(line)

Can you close the issue?

smithjackson35 commented 4 years ago

Yep thanks guys!