data augmentation and improvements with supervised learning methods

Mechah commented 3 years ago

First of all thanks a lot for developing this useful q2-plugin! I augment my feature-table at a depth of 1000 for 10 times and used it for metadata predictions with the q2-sample-classifier plugin. I was pretty surprised by the result: while I only got prediction accuracies around 50-60% for the original data set, data augmentation with q2-data-augment improved this up to prediction accuracies of 100%. However, I’m still a bit sceptical about this improvement… Since I do not have any prior knowledge of data augmentation I would be interested in any comments or links to web resources or discussions how to judge such an improvement for supervised learning methods. Looking forward to the discussion!

xy-repo commented 3 years ago

Hi Mechah,

Thanks for your feedback. Data augmentation is a widely used tech in the machine learning area, which augmenent the TRAINING dataset by incorporating some 'new' data with the same labels. The q2-sample-classifier plugin seems to use the whole feature table for both training and testing. In practice, you should not augment the test set, so it is not appropriate to do the classification with data augmentation using q2-sample-classifier.

I noticed this problem that it is not compatible with q2-sample-classifier, and am trying to modify it. I will release the paper and updation about this method as soon as possible.

Best, Yao

Mechah commented 3 years ago

Dear Yao, thank you so much for your feedback. Would be great to have more comparability between q2-data-augment and q2-sample-classifier. Looking forward to new updates from your side.

xy-repo / q2-data-augment

data augmentation and improvements with supervised learning methods #1