"FullRunName" appears hardcoded into "plot_sample_mean"

I'll start with appreciation that the proBatch package exists, it solves a lot of issues I'm facing just getting started with doing batch normalisation in a long proteomics experiment properly.

I am following the example code given in the vignette, using the example data and my own data for comparison.

My own data originates from SWATH data processed with DIA-NN, which obviously generates its own column names. According to documentation and the vignette, that shouldn't be a problem, with the clear caveat that the naming needs to be consistent between the three files.

PROBLEM The plot_sample_mean doesn't specify the sample_id_column value as a parameter, yet it seems that the required column name is FullRunName. When the name is set to something else, the following error occurs:

> plot_sample_mean(sppa_log_matrix, sppa_annotation) Error in check_sample_consistency(sample_annotation, sample_id_col, df_ave, : Sample ID column FullRunName is not defined in sample annotation, sample annotation cannot be used for correction/plotting

However, when the check_sample_consistency function is used on the same data (with matching RunID column names, where the name is something other than FullRunName), the two dataframes are merged without issue, as expected.

As far as I can tell, the column name for the RunID/MS filename/some unique ID is set in three places:

a column heading in the annotation table
assuming the precursor intensities are imported in a wide-format measurement table, the ID is set manually when converting to long format, using standard R manipulation # read in intensity data measurement_wide <- read.table(file = 'measurement_table.tsv', sep = '\t', header = TRUE) measurement_long <- melt(setDT(measurement_wide), id.vars = c("Protein.Group", "Precursor.Id"), variable.name = "MS_file")
by defining the sample_id_column variable (this one might be optional)

Changing the RunID column to 'FullRunName' resolves the issue. The function plot_sample_mean gives the same error if colouring according to batch is used or not.

It appears that the issue can be avoided with correct naming in the source data files, which is not specified in the vignette. This is either a documentation or an implementation issues.

I hope the description here will help others, as the proBatch package really does look useful.

symbioticMe / proBatch

"FullRunName" appears hardcoded into "plot_sample_mean" #14