transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

More flexible filename handling #82

Closed lisakmalins closed 5 months ago

lisakmalins commented 5 months ago

This is a fix for https://github.com/transcript/samsa2/issues/81, where run_DESeq_stats.R and Subsystems_DESeq_stats.R will fail if filenames do not follow the pattern experimental_#_... or control_#_....

Rather than splitting the filenames by underscores, the new code removes experimental_ or control_ from the beginning and .cleaned from the end in order to get the sample names for matching.

Also, I solved the problem of R wanting to change the counts to factors by setting the sample names as row names before transposing. (Previously, if the sample names were not numeric, transposing the dataframe would create columns that were mixed so R would convert all columns to factors. Pulling the sample names into the row names before transposing allows them to directly become column names without affecting the data types.)

Thank you and let me know if you have any questions!