section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 889 forks source link

Pattern Analysis with Kernel PCA in Machine Learning #5353

Closed Daniel695 closed 2 years ago

Daniel695 commented 2 years ago

NOTE: All 1st-time contributors should know Topic approval and PR submission does NOT guarantee your Topic/Article will be published. Our team of Peer Reviewers and Content Moderators will review all PRs that come in to make sure they adhere to the standard of quality we expect from the EngEd community.

We expect all community memebers to go through provided examples, resources, and previously published material before submitting content. As a rule of thumb, please only submit articles (pull requests) that are complete, formatted correctly, and include a fully polished article (ready to be published and error free).

All revisions and edits should be completed on your own forked repo (as to not take up room in the queue). Any PR that is submitted incomplete (meaning not ready to be published as is) will be closed. A PR being closed does NOT mean the article can not be published (or fixed) it simply means that the edits, revisions, and fixes will happen outside of the queue.

Topic Suggestion

Enter topic suggestion here...

Pre-submission advice

By following all our pre-submission advice and reviewing our Resources folder, you will maximise your chances of your topic being approved.

We ask that you please be patient as our team works through approving and publishing all articles/tutorials in a timely manner.

Allow 1-3 days for a topic to be reviewed and/or approved - allow 3-7 days for an articles to be reviewed and/or published (subject to vary depending on volume and/or backlog of articles).

Be sure to visit our Resources Page for tools, resources, and example articles that will help you propose and write a successful article.

Please ensure that you have only one open issue + linked pull request at a time. This will ensure that we complete the article in a timely manner from inception to publishing.)

We tend to stray away or tend not to publish reviews/comparisons of commercial product offerings.

Writing sample(s):

Include any links or writing samples - to help our team better gauge your writing quality.

Proposal Submission

Pattern Analysis with Kernel PCA in Machine Learning

Proposed article introduction

In Dimensionality reduction, the goal is to reduce the number of available features from higher dimensional space to lower dimensional space such that we only remain with feature space that explain most of the variance in the data. The reason for this is to ensure that our future classification model developed on these data does not suffer from the curse of dimensionality. The PCA algorithm has been commonly used in feature reduction tasks. However, this algorithm turns out to have serious limitations.

As most big data from the real world are non-linearly separable, the PCA algorithm fails to classify the data as it works with linearly separable data. This poses a challenge that the Kernel PCA provides the solution.

The kernel PCA involves using kernel methods together with linear PCA to find the key components in the data.

The kernel methods involve mapping the data from the original features space to the new feature space such that the data is now linearly separable in this new features space. Of course, the new feature space is usually higher than the original feature space. Since the data is linearly separable in this higher space, we apply our usual PCA. The challenge is that dimensionality reduction aims to end up with lower feature space than the provided data space. Instead, we are mapping the data to higher-dimensional space, increasing data dimensionality. This would make the model more complex than ever before. Since this is not usually the goal of dimensionality reduction, we adopt a kernel trick technique that helps us navigate all computations related to higher-dimensional space. This technique enables us to work with the original features space without mapping the data to a higher feature space.

To get started in this article, we shall introduce dimensionality reduction and precisely talk about the concept of the Curse of Dimensionality. Also, we shall revisit the linear PCA and discuss its limitations which creates the way for the Kernel PCA. Then, on the kernel PCA, we shall talk about what kind of data is applicable with, what are the kernel methods and with a computational example, we shall show how these methods work. Also, we shall discuss the kernel trick, and by hand, we illustrate how it works. Finally, we shall outline various kernel functions that are highly utilized in the kernel PCA before implementing this algorithm with Scikit-Learn in python.

Key takeaways

This article is expected to enhance the leaner's clarity and understanding on;

Article quality

This article will explore the theory behind the Kernel PCA in greater depth. It will ensure that the learner is offered clear content, easy to follow along and understand. In our implementation session, all the code snippets will be appropriately explained, and hence the learner will not have a hard time understanding what the code is trying to do.

References

N/A

Conclusion

Finally, remove the Pre-Submission advice section and all our blockquoted notes as you fill in the form before you submit. We look forwarding to reviewing your topic suggestion.

Templates to use as guides

LinusMuema commented 2 years ago

Hi @Daniel695

Thank you for your response and we thank you for submitting your topic. After some careful consideration it struck us that this topic may be a bit over saturated throughout other blog sites and official documentations as previously mentioned on your topic form by our content moderator.

We typically refrain from publishing content that is covered widely on the net or other blogs. As we're more interested in original, practitioner-focused content that takes a deeper dive into programming-centric concepts.

We believe this is the best way for students to build a great portfolio (for potential employers) is by building what does not exist and what can provide the most value.

You are more than welcome to pursue another more in depth topic