Open rsehgal opened 5 years ago
Actually comparisons are not valid because I was referring to earlier kmeans.May need to apply kmeans and its visualization
@Ani1211999 and @apoorvabh98 : may i ask both of you to prepare a paragraph mentioning the pros and cons of different clustering and outlier detection algorithm.
3D VIsualizaation of KMeans Algorithm
5-point summary of CLusters ,X,Y,Z,Scat_Angle,doca,Labels count,18297.0,18297.0,18297.0,18297.0,18297.0,18297.0 mean,143.90380001844733,-142.42837178908442,-15.058683565304142,0.0020781726888428917,1.8907880218192317,0.0 std,61.713498786096245,61.61197909999722,58.54052581581665,0.05718834457225785,3.3101579006076722,0.0 min,35.010398864746094,-263.2149963378906,-179.99400329589844,-0.37970298528671265,1.0331499652238563e-05,0.0 25%,88.38240051269531,-195.6649932861328,-57.8578987121582,-0.016586799174547195,0.1890919953584671,0.0 50%,141.08599853515625,-138.8000030517578,-16.91659927368164,0.0005704070208594203,0.7104499936103821,0.0 75%,197.5290069580078,-86.46949768066406,25.162399291992188,0.018200699239969254,2.1150801181793213,0.0 max,271.1310119628906,-35.87049865722656,179.7729949951172,0.49060800671577454,59.59669876098633,0.0
,X,Y,Z,Scat_Angle,doca,Labels count,18923.0,18923.0,18923.0,18923.0,18923.0,18923.0 mean,-143.75156193901392,-142.72557519131456,-18.542911766715747,0.0005038876524158286,0.5346113245812124,1.0 std,60.76667477250778,60.83493153162352,56.943378070939154,0.02246567733153096,1.0495080974143667,0.0 min,-249.99600219726562,-250.0,-179.52699279785156,-0.25989601016044617,5.654580036207335e-06,1.0 25%,-196.30599975585938,-195.06849670410156,-59.31030082702637,-0.004770365078002214,0.053186651319265366,1.0 50%,-140.20399475097656,-139.78599548339844,-20.284900665283203,0.0003128389944322407,0.19027100503444672,1.0 75%,-89.26950073242188,-87.39949798583984,19.892799377441406,0.004946419969201088,0.555512011051178,1.0 max,-30.381099700927734,-22.539899826049805,134.4149932861328,0.3098370134830475,24.967500686645508,1.0
,X,Y,Z,Scat_Angle,doca,Labels count,19059.0,19059.0,19059.0,19059.0,19059.0,19059.0 mean,144.90995569227624,143.6314159087378,-16.963632184522766,0.0007849737199248537,1.1105259998756882,2.0 std,61.40458737086126,61.55664561149683,57.69693439155289,0.03960849162017207,2.0036062987018575,0.0 min,35.797401428222656,14.847599983215332,-179.62399291992188,-0.3618150055408478,6.035870114828867e-07,2.0 25%,89.84734725952148,87.74814987182617,-58.614450454711914,-0.010144650004804134,0.10982649773359299,2.0 50%,142.0399932861328,140.87899780273438,-19.332000732421875,0.0004680530109908432,0.4088349938392639,2.0 75%,198.5135040283203,196.42499542236328,22.66819953918457,0.010763899888843298,1.2108049988746643,2.0 max,249.99400329589844,249.99899291992188,134.96800231933594,0.3517560064792633,34.80910110473633,2.0
,X,Y,Z,Scat_Angle,doca,Labels count,18558.0,18558.0,18558.0,18558.0,18558.0,18558.0 mean,-144.5259421363269,143.01978640942266,-15.700594808170562,0.0008103928472720282,1.8583256535412052,3.0 std,61.83683979596756,61.42122100071312,57.815400675086536,0.056274416367562995,3.2969438161743354,0.0 min,-249.99400329589844,16.9950008392334,-179.9320068359375,-0.4322800040245056,1.536290074000135e-05,3.0 25%,-198.6650047302246,87.43007278442383,-57.21517467498779,-0.017683349549770355,0.19400475174188614,3.0 50%,-142.13849639892578,139.87950134277344,-17.90280055999756,0.00048066650924738497,0.7224155068397522,3.0 75%,-88.17717170715332,195.8574981689453,24.724899768829346,0.017998175229877234,2.0662450194358826,3.0 max,-34.698699951171875,249.99899291992188,134.95799255371094,0.3926849961280823,55.48820114135742,3.0
KMeans visualization over new dataset at precision 4
Means of different clusters in KMeans: 0.000810 0.002708 0.000785 0.000504
NIce evaluation of classification algorithms
During clustering,we first applied K-Means algorithm on the filtered data.After visualization of the data,we observed that there were four clear clusters present in the data.So we applied K-Means Algorithm with parameters as n_clusters=4 so as to identify the clusters.The algorithm returned with satisfactory results.The simulation was done such that blocks were placed with centers at +-150 in X-Y Axes.KMeans returned outputs in the range +-143 - +-144.Further filtering was not possible due to the presence of outliers.Since means get affected in clustering even due to few number of outliers so we did not achieve improvement in clusters.
So we shifted to K-Medians,which operates similarly like K-Means but instead considers medians as centroids instead of means.Since outliers cannot deviate the median as largely as the mean,as we thought KMedians returned better results than KMeans.Since our dataset is relatively small dataset,the overhead cost of applying sort operation would not affect the efficiency of producing the results in an optimum time. Centers were observed in the range +-145 - +-147 in the X-Y Axes. In the second simulation K-Medians was able to cluster similar material blocks at a precision of 3 decimal places whereas K-Means required a precision of four. Here are some of the results of KMedians and K-Means K-MEDIANS X,Y count,3527.0,3527.0 mean,-145.67633056640625,-146.5143585205078 std 61.9560432434082,59.747432708740234 min,-298.0220031738281,-249.94700622558594 25%,-199.22750091552734,-198.03400421142578 50%,-143.33299255371094,-145.16200256347656 75%,-89.89049911499023,-93.09885025024414 max,-21.27560043334961,-29.654499053955078
K-MEANS
X,Y count,18923.0,18923.0 mean,-143.75156193901392,-142.72557519131456 min,-249.99600219726562,-250.0,-179.52699279785156 25%,-196.30599975585938,-195.06849670410156 50%,-140.20399475097656,-139.78599548339844 75%,-89.26950073242188,-87.39949798583984 max,-30.381099700927734,-22.539899826049805
It can be clearly seen that K-Medians returns better results than K-Means.
KNN gave an accuracy of 60 percent sir
Not understanding about the basics of a ROC Curve
Sir one more request after making a document out of this will you send me the document?
RAndom FOrests giving output 65% accuracy after feature selection,improvement observed till 67%
Sir,I did whatever I could make out of that code but resut got differed in multiclass.Here is The ROC for class 2 not sure how it's being made and what model.Also accuracy has been shown
And whatever you are trying to implement is not a binary search it is basically a heap(more precisely min-heap).It will take average logn time i
DEcision tree gave an accuracy of 59 percent after feature selection improvement was observed till 62 cent
Neural Networks returned accuracy of 58%.As you expected random forests returned with the best output among all the respective algorithms
Sklearn contains packages for bagging and boosting for training weak but homogeneous learners but for using prediction of different models to make a final model we need to do stacking.It is available under the library vecstack.
Acuuracy max observed was 69%.Classifiers used were neural networks and random forests.
Here the target to process the point cloud generated after simulation, The first obvious step would be to visualize th point in 2D and 3D. and make write some outlier detection algorithm to remove the outlier and finally try to find clusters.