section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 889 forks source link

[Machine Learning] Dimension Reduction with Principal Component Analysis (PCA) in python #4213

Closed nelsonnrl closed 3 years ago

nelsonnrl commented 3 years ago

Topic Suggestion

Pre-submission advice

By following our pre-submission advice and reviewing our Resources folder, you will maximize your chances of your topic being approved.

We ask that you please be patient as our team works through approving and publishing all articles/tutorials in a timely manner.

Allow 1-3 days for a topic to be reviewed and/or approved - allow 3-7 days for an article to be reviewed and/or published.

Be sure to visit our Resources Page for tools, resources, and example articles that will help you propose and write a successful article.

Please ensure that you have only one open issue + linked pull request at a time. This will ensure that we complete the article in a timely manner from inception to publishing.)

We tend to stray away or tend not to publish reviews/comparisons of commercial product offerings.

Proposal Submission

Dimensionality Reduction with Principal Component Analysis in python

Your title should be descriptive of the article/tutorial. Be Specific.

Use keyword research to improve your article's SEO.

Proposed article introduction

In Machine learning, it is common to come across datasets with hundreds or even thousands of features. Implementing models on such datasets becomes a great challenge in terms of computational cost. Also, models build on high dimensional data space are prone to the problem of the course of dimensionality. To minimize this problem, use a technique known as Dimensionality Reduction.

Dimensionality Reduction involves transforming the feature of a dataset from a high-dimensional space to a low-dimensional space. Some of the techniques that are used in dimensionality reduction include PCA, Linear Discriminant Analysis(LDA), Kernal PCA, Conical Correlation Analysis (CCA) e.t.c. Of all these techniques, PCA is the most used technique in dimension reduction.

In this article, we shall talk about the PCA algorithm. Then, we shall learn how this algorithm maps data from high dimensions, say d, to low dimensions, say k, where ( k < d ). Finally, we shall develop a PCA algorithm in python.

For example, if the article is based on machine learning, use the following: [Machine learning] Introduction to Machine Learning. If the article is based on developing an Android application, use the following: [Android] Developing Apps using Android.

ONLY viable topics to pick from:

Key takeaways

  1. Learners will understand how to develop the PCA algorithm.
  2. Learners will understand how they can use approach the problem of dimension reduction using PCA.
  3. Learners will understand how to implement the PCA algorithm in python using high dimensional datasets.

Article quality

How is your article unique? Tell us what makes your approach different from similar articles that have been published on the same topic? Is yours more in-depth? Does it cover additional topics? Do you provide handy tips or anecdotal advice?

References

N/A

Conclusion

Finally, remove the Pre-Submission advice section and all our blockquoted notes as you fill in the form before you submit. We look forwarding to reviewing your topic suggestion.

Templates to use as guides

ahmadmardeni1 commented 3 years ago

Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it.

ahmadmardeni1 commented 3 years ago

Does your article overlap with the following articles? @nelsonnrl

nelsonnrl commented 3 years ago

@ahmadmardeni1 I have gone through the above two articles. For both articles, PCA is implemented on unstructured data, i.e., the image dataset. In my article, we will implement PCA on structured data, i.e., the dataset of the database. The article on Face Recognition using Principal Component Analysis (PCA) implementation is made in Matlab. Also, in the second article, implementation is made in python, but still, the difference arises because we are dealing with different types of datasets. The only overlap is that the article on Image Compression using Principal Component Analysis (PCA) covers the intuition part I intended to dive into. Since this resource is already in place, I think learners can be referred to this resource and my article base only on how to implement PCA on a structured dataset in python. So, there's an outstanding feature between my article and the above two articles.

ahmadmardeni1 commented 3 years ago

Sounds like a helpful topic - let's please be sure it adds value beyond what is in any official docs and/or what is covered in other blog sites. (the articles should go beyond a basic explanation - and it is always best to reference any EngEd article and build upon it). @nelsonnrl

Please be attentive to grammar/readability and make sure that you put your article through a thorough editing review prior to submitting it for final approval. (There are some great free tools that we reference in EngEd resources.) ANY ARTICLE SUBMITTED WITH GLARING ERRORS WILL BE IMMEDIATELY CLOSED.

Please be sure to double-check that it does not overlap with any existing EngEd articles, articles on other blog sites, or any incoming EngEd topic suggestions (if you haven't already) to avoid any potential article closure, please reference any relevant EngEd articles in yours. - Approved