Open jingqilin opened 5 years ago
I am not quite sure how to do it. I think PCA would be a good choice. However, I am not sure how to really use it.
Hi,
I think you can call the built-in function filter to filter elements. The function filters out the elements in the sequence where the function function is called with False results, and outputs only the list of elements that meet the criteria.
Thanks, Xinyu Zhang
Hi, For the columns, you can just drop them if you think you don't need them. For the rows, you can check what you want to use and filter them out. Hope it will help! ^_^
I agree with Iris0114. To drop the columns, you can use the below code: df = df.drop(columns=['column1', 'column1'])
You can also apply conditional checks to filter the data as per your requirement.
Hi,
In this situation when you really don't know which columns you want to use and which not. I recommend plotting correlation plot for all variables after treating them for missing or null values. You can do that using below code. And once, you get correlation coefficients of parameters, choose that parameters which have the highest correlation with your target variable.
This is not a full proof method like Ordinary Least Squares or PCS, but this does provide basic insights into the data when a dataset is huge with lots of parameters in it.
Code for correlation plot: matrix = df.corr() f, ax = plt.subplots(figsize=(9, 6)) sns.heatmap(matrix, vmax=.8, square=True, cmap="BuPu")
Hope this helps you!
Best, Kalyani
I think you should drop that column.
Hi, If you have a dataset with a lot of factors and not all of them are important, how can you filter the unimportant variables (Dimensionality reduction) to improve your prediction accuracy? Regards, Jingqi Lin