lehtiolab / ddamsproteomics

A Nextflow MS DDA proteomics pipeline
MIT License
3 stars 5 forks source link

Setnames/groupnames cannot be too similar for then the "add sample group to header" regex fails #31

Closed glormph closed 5 months ago

glormph commented 5 months ago

Shouldve known better than to use a regex, this:

sed -E "s/${arr[0]}_([a-z0-9]*plex)_${arr[1]}/${arr[4]}_${arr[3]}_${arr[2]}_\\1_${arr[1]}/

Will bite itself in the ass when a setname (${arr[0]} ) is 1 and another is 11 for example.

Could be solved by adding a ^ to mark start of string/line, but it would maybe be better to put this in msstitch?

Found by Yaroslav

Second occurence in qc_protein.R: if (length(grep(logfcname, names(feats)))) { This returns a partial match also so if two comparisons are ABC-QWER and QWER-BC it will return also true for BC-QWER

Need in both cases to fully match, not partial.

glormph commented 5 months ago

To clarify:

in qc_protein, all combinations of sample groups are tried to create a plot, but only some will exist (i.e. A-B but not B-A since they are identical), and here it will mistakenly find both.

In the sed with arr, it will assign the wrong sample group since multiple sets will be matched due to the overlap. This makes it wrong, and crashes DEqMS in the process.