xy-repo / q2-data-augment

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

q2-data-augment: QIIME2 plugin for data augmentation using rarefaction (Rarefy for Augment)

Data augmentation is a very useful and widely used method in data science (see: https://en.wikipedia.org/wiki/Data_augmentation). Especially, it can increase the sample size of the training set for machine learning models.

Rarefaction can be used as an effective and reliable method for data augmentation, given the following reasons:

This method, named Rarefy for Augment, is very simple. Run random rarefaction N times. Each time, rename the samples and corresponding metadata and concatenate them with the previous two files. Finally, the sample size can be enlarged N times.

Installing

conda activate qiime2-2020.11
pip install git+https://github.com/yxia0125/q2-data-augment.git

Type "qiime data-augment" to test if the installation is successful.

Uninstalling

pip uninstall q2-data-augment

Using

qiime data-augment augment --i-table raw_table.qza 
                           --m-raw-metadata-file raw_metadata.tsv 
                           --p-sampling-depth 2000 
                           --p-augment-times 10
                           --p-output-path-metadata augmented_meta.tsv  
                           --o-augmented-table augmented_table.qza 

"raw_table.qza" and "raw_metadata.tsv" are the input raw feature table and metadata; --p-sampling-depth --> the rarefaction depth; --p-augment-times set to 10 means repeating rarefaction 10 times (i.e., enlarge sample 10 times); "augmented_table.qza" is the augmented feature table, its sample size is 10 times larger than "raw_table.qza", and new rarefied samples end with "_X" (X represents the i_th rarefaction); "augmented_meta.tsv" is the augmented metadata that has matching sample names in "augmented_table.qza".

Note: Only need to augment the training set.

Citing

If you are interested to use this method, please include the following citation:

Yao Xia, q2-data-augment: QIIME2 plugin for data augmentation using rarefaction (Rarefy for Augment), (2021), GitHub repository, https://github.com/yxia0125/q2-repeat-rarefy.