nasa / GeneLab_Data_Processing

60 stars 42 forks source link

[Microarray] Factor group column renaming issue when one factor is substring of another #100

Open cyouh95 opened 1 month ago

cyouh95 commented 1 month ago

Description

When renaming factor group column names, some factors may be a substring of another, causing the substring to be replaced instead of the full group name if it appears first in the group_name_mapping:

https://github.com/nasa/GeneLab_Data_Processing/blob/90d6bb5d6a20d817fa17ac5cb0763d4f8f75966b/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/bin/Affymetrix.qmd#L830-L836

This caused the below assertion to fail:

https://github.com/nasa/GeneLab_Data_Processing/blob/90d6bb5d6a20d817fa17ac5cb0763d4f8f75966b/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/bin/Affymetrix.qmd#L1010

Solution

Assuming the substring factor is always shorter in length, a workaround is to sort group_name_mapping by descending length so the substring is not subbed first (here in Affymetrix.qmd):

unique_group_name_mapping <- unique(group_name_mapping) %>% arrange(-nchar(safe_name))

Same can be done here in Agile1CMP.qmd.