rsehgal / TomoML

Application of ML in Muon Tomography
2 stars 0 forks source link

Analyzing the PoCA point cloud and try to find outlier and Clusters out of it #1

Open rsehgal opened 5 years ago

rsehgal commented 5 years ago

Here the target to process the point cloud generated after simulation, The first obvious step would be to visualize th point in 2D and 3D. and make write some outlier detection algorithm to remove the outlier and finally try to find clusters.

Ani1211999 commented 5 years ago

2D-Visualization for POCA Raw_data dataset and POCA Filtered_dataset Filtered_2D Raw_2D

3D-VIsualization for POCA Raw_data dataset and POCA Filtered_dataset Filtered_3D Raw_3D

rsehgal commented 5 years ago

Thanks Aniket, Keep on doing the good job.

Cheers,

rsehgal commented 5 years ago

Also try to get the cluster out of filtered data, and try to find the dimension of cluster in 3D.

Ani1211999 commented 5 years ago

https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/ Nice article

Ani1211999 commented 5 years ago

https://www.datacamp.com/community/tutorials/dbscan-macroscopic-investigation-python DBSCAN IN DETAIL

Ani1211999 commented 5 years ago

Issue has arose while performing dbscan.What shall I do about negative values or rather -ve co-ords??

Ani1211999 commented 5 years ago

http://nebula.wsimg.com/b573f6daeaad3f38ead59a00eae13ffd?AccessKeyId=BDCFD2864AC62650B045&disposition=0&alloworigin=1

Image clustering

Ani1211999 commented 5 years ago

https://scikit-learn.org/stable/modules/outlier_detection.html

Scikit Outlier Detection Methods

Ani1211999 commented 5 years ago

https://blog.floydhub.com/introduction-to-anomaly-detection-in-python/

Really useful article

rsehgal commented 5 years ago

Anomoly detection is very good concept. We should certainly spend some time on it,

https://blog.floydhub.com/introduction-to-anomaly-detection-in-python/

Really useful article

Ani1211999 commented 5 years ago

Yes I have added this article earlier I guess

On Mon, Jun 10, 2019, 4:09 PM rsehgal notifications@github.com wrote:

Anomoly detection is very good concept. We should certainly spend some time on it,

https://blog.floydhub.com/introduction-to-anomaly-detection-in-python/

Really useful article

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rsehgal/TomoML/issues/1?email_source=notifications&email_token=AKGDXGT2SZLTHMH4ABP6YH3PZYVN3A5CNFSM4HVAD4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXJRCIA#issuecomment-500371744, or mute the thread https://github.com/notifications/unsubscribe-auth/AKGDXGV2FJI4IZ6XP6PHXM3PZYVN3ANCNFSM4HVAD4HQ .

rsehgal commented 5 years ago

Have a look at ijca_survey_paper.pdf Please compare various Proximity base Techniques likes kNearest Neighbour kMeans DBSCAN IsolationForest

Ani1211999 commented 5 years ago

With respect to outliers or clusters??

On Tue, Jun 18, 2019, 10:12 AM rsehgal notifications@github.com wrote:

Please compare the DBSCAN and IsolationForest and few more

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rsehgal/TomoML/issues/1?email_source=notifications&email_token=AKGDXGRILBNDQMRNUB6LBODP3BRRXA5CNFSM4HVAD4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5FDXQ#issuecomment-502944222, or mute the thread https://github.com/notifications/unsubscribe-auth/AKGDXGQJHJFUWSC75DFJWDDP3BRRXANCNFSM4HVAD4HQ .

Ani1211999 commented 5 years ago

Got it . Outliers

On Tue, Jun 18, 2019, 10:46 AM Aniket Shinde aniketshinde12@gmail.com wrote:

With respect to outliers or clusters??

On Tue, Jun 18, 2019, 10:12 AM rsehgal notifications@github.com wrote:

Please compare the DBSCAN and IsolationForest and few more

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rsehgal/TomoML/issues/1?email_source=notifications&email_token=AKGDXGRILBNDQMRNUB6LBODP3BRRXA5CNFSM4HVAD4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5FDXQ#issuecomment-502944222, or mute the thread https://github.com/notifications/unsubscribe-auth/AKGDXGQJHJFUWSC75DFJWDDP3BRRXANCNFSM4HVAD4HQ .

rsehgal commented 5 years ago

Another good paper that may be helpful to us. Bandieramonte_2015_J._Phys.__Conf._Ser._608_012046.pdf

rsehgal commented 5 years ago

Please find the Distance Of Closest Approach (DoCA) Histogram. Try to reproduce it using python docaHist

rsehgal commented 5 years ago

One can also explore Gaussian mixture model (GMM)

Ani1211999 commented 5 years ago

Yes sir read an article about it yesterday https://towardsdatascience.com/wondering-how-to-build-an-anomaly-detection-model-87d28e50309

Ani1211999 commented 5 years ago

Sir this is the visualization b/w x-y,y-z,x-z and x-doca.Can you suggest me some more visualization in order to select right attributes?? gen_vis_filtered_doca

Ani1211999 commented 5 years ago

KMEANS OUTPUT BEAUTIFULLY DIFFERENTIATION OBSERVED kmeans_clusters_supervised

rsehgal commented 5 years ago

Very nice Aniket. this is what we i wanted !!!!!!!!

Can we have similar results in 3D

Ani1211999 commented 5 years ago

You saw??

On Mon, Jun 24, 2019, 5:16 PM rsehgal notifications@github.com wrote:

Very nice Aniket. this is what we i wanted !!!!!!!!

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rsehgal/TomoML/issues/1?email_source=notifications&email_token=AKGDXGWKLNUCJEA3ZI64HHTP4CX2ZA5CNFSM4HVAD4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYMU7YA#issuecomment-504975328, or mute the thread https://github.com/notifications/unsubscribe-auth/AKGDXGU4FI77LEZZU74ESILP4CX2ZANCNFSM4HVAD4HQ .

rsehgal commented 5 years ago

Yes i saw,

Ani1211999 commented 5 years ago

kmeans_3d

rsehgal commented 5 years ago

Hi Aniket, Can we also get the mean and standard deviation of scattering angle of points in individual cluster. Like in this case we should get 4 means and 4 standard deviation values for 4 clusters

Ani1211999 commented 5 years ago

Doca vs log(DoCA) Graph doca_vs_log(doca)_cluster_4

Ani1211999 commented 5 years ago

SIze(RAnge) of cluster vs LOg(DOca). Have a look doca_vs_log(doca)_range_vs_log(doca)

Ani1211999 commented 5 years ago

Added Sorted Centroid(distances) vs the Size(range) of cluster doca_vs_log(doca)_range_vs_log(doca)_cent_vs_log(doca)

Ani1211999 commented 5 years ago

Not good outlier removal after using scattering Angle HAve a look Sir outlier_std

rsehgal commented 5 years ago

Did you plot only those points which comes under 2sigma in the histogram of scattering angle.? Can we also have the plot of scattering angle historgram for this cluster ?

rsehgal commented 5 years ago

One good thing you that you can clearly see from your fourth plot. If you don't consider the outlier then in the X axis the cluster varies from 50 to 250 which implies the length of side of scatterer block along X axis is (250-50 = 200) which is exact value what i am using in the simulations. Similar result can be seen from Y axis.


Good, Keep on doing good job

Ani1211999 commented 5 years ago

you can check the code

On Wed, Jun 26, 2019, 9:46 AM rsehgal notifications@github.com wrote:

One good thing you that you can clearly see from your fourth plot. If you don't consider the outlier then in the X axis the cluster varies from 50 to 250 which implies the length of side of scatterer block along X axis is (250-50 = 200) which is exact value what i am using in the simulations. Similar result can be seen from Y axis.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/rsehgal/TomoML/issues/1?email_source=notifications&email_token=AKGDXGU2NOAR67F3JISXJFLP4LUSTA5CNFSM4HVAD4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSITQA#issuecomment-505711040, or mute the thread https://github.com/notifications/unsubscribe-auth/AKGDXGVITLM7BN4WWJNIZ73P4LUSTANCNFSM4HVAD4HQ .

Ani1211999 commented 5 years ago

That's not an improved result.At the time of clustering itself, it was showing such behaviour.You can observe the cluster graph above and see.

rsehgal commented 5 years ago

OK

That's not an improved result.At the time of clustering itself, it was showing such behaviour.You can observe the cluster graph above and see.

Ani1211999 commented 5 years ago

SIr can you send the actual package name or the command??

Ani1211999 commented 5 years ago

I want Pillow package

Ani1211999 commented 5 years ago

According to me KMeans won't give us proper results,because it includes outliers while calculating centroid thus leading to deviation from ideal behaviour.It is highly sensitive to outliers. After selecting a cluster and removing its outliers it becomes difficult to remove them since cluster centers are deviated themselves thus making their removal difficult.So we now move to k-medoids,k-medians since medians/medoids are less sensitive to outliers

Ani1211999 commented 5 years ago

KMedians Output based on Mean_Scat_ANGLe kmedians

Ani1211999 commented 5 years ago

KMedians,Kmeans ou kmeans_2d kmedians_3D

tput in three-dimensional.

rsehgal commented 5 years ago

HI Aniket, Please assign color using only upto second place of decimal of mean of scattering angle

I am generating the data using different material, Once its ready then we will run you clustering on this data also, and see if we can get different color for different material.

Ani1211999 commented 5 years ago

Mean Scattering Angle is itself of order 10^-3.Outputs for precision 2 and precision 3 are attached. cluster_precison_2 cluster_precision_3

rsehgal commented 5 years ago

Hi Aniket, Please find the file attached in csv and space separated format filteredDiffMaterial.txt

rawDiffMaterial.txt

CSVfilteredDiffMaterial.txt CSVrawDiffMaterial.txt

Github was not allowing to upload file with csv extenstion. So file that Started with CSV are actually csv file, you can just rename it to .csv

Ani1211999 commented 5 years ago

Yes sir here are the final plots :1)With a precision of 2 decimal places kmedians_diff_material

2)Without precision kmedians_without_precision

rsehgal commented 5 years ago

Can you please write the median and of each cluster along with its mean scattering

Ani1211999 commented 5 years ago

Final Medians X Y Z Scat_Angle doCA Mean_ScattERING_ANgle -142.0755, 102.2153, 12.135200000000001, -0.040366360000000004, 1.9149356499999999 0.0008188780047930777 142.0355, 158.93189999999998, -94.5895, -0.021385835, 0.44684695 0.0012874675448983908

-140.204, -140.766, -5.22398, 0.000646646, 0.130713 0.0004999243537895381 140.84750000000003, -159.72415, -90.7157, -0.0004726000000000001, 0.24103750000000002 0.0016299095004796982

Ani1211999 commented 5 years ago

you can compare the results of clusters of K-Means with K-Medians by looking at the files in dataset folder

rsehgal commented 5 years ago

Any significance difference between KMean and KMedians ??

rsehgal commented 5 years ago

Results are not bad, Only first row is not as expected, but this may be due to less events. So may we should repeat this test with atleast double the number of events -142.0755, 102.2153, 0.0008188780047930777(Pb) 142.0355, 158.93189999999998, 0.0012874675448983908 (Fe) -140.204, -140.766, 0.0004999243537895381(Al) 140.847500, -159.72415, 0.0016299095004796982(Pb)

in the other three row. Al got the least scattering, then Fe and max is for Pb.

=========

Good JOb

Ani1211999 commented 5 years ago

For one cluster x and y axis have -145 -146 for kmedians and in kmeans it is -145 -142 for another cluster whereas for another cluster it is -145 134 for kmedians and -145 142 for kmeans

rsehgal commented 5 years ago

Lets discuss on Monday.