hredestig / pcaMethods

Perform PCA on data with missing values in R
GNU General Public License v2.0
45 stars 10 forks source link

Unexpected results with robustSvd #6

Closed neilcaithness closed 5 years ago

neilcaithness commented 5 years ago

First, thanks for the package.

I want to use your robustSvd in an attempt to reduce model distortion by extreme outliers. Here are two outputs from the same dataset, the first using base::svd and the second using pcaMethods::robustSvd.

I get similar unexpected outcome with the Iris dataset iris[,-5] but especially marked distortions if I include one-hot encoded variables for species. In all cases I centre and standardize.

Any comment would be greatly appreciated.

Best regards Neil

svd robust

neilcaithness commented 5 years ago

Not so unexpected I guess. The half obscured vector off to the left is Hour and this dataset spans 13 hours (night-time with fewer data points on the right. I'll either remove it or provide an unordered one-hot encoding.

I was hoping for more resolution on the smaller vectors, but perhaps this is exactly right for what I've given it.

I'd still very much appreciate any comments so I'll leave this here for a few days before closing the issue.

hredestig commented 5 years ago

So late reply but thanks for the issue. Doesn't necessarily look off to me.. For diagnosis I guess you could try to check R2 per observation with 2 PCs with and without including a strong outlier to check what algorithm/preprocessing etc would work best.