Bug Description
When a reference level is chosen that is a column within the metadata, but the associated IDs are not in the feature table, no error is raised - ANCOM-BC just defaults back to alphabetical order for the intercept column. We should raise an error for this, as it produces results that aren't accurate to the user for their provided inputs.
In this example, Column1::group1 was chosen as the reference level, but the sample ID S001 is not included in the feature table, and is thus not included in the actual analysis. This causes the reference level behavior to default back to alphabetical order for the chosen formula column, meaning that group2 is selected as the intercept (i.e. reference level) instead of group1. This would produce the following differential table:
id (Intercept) Column1group3
feature1 0.004 0.0005
feature2 0.352 0.00478
This produces a confusing output for users because they are expecting the (Intercept) column to be group1 and for there to be two additional columns (group2 and group3 from Column1). It is unclear from these results which column is used as the intercept (i.e. reference level) and why one of the columns seemingly disappeared.
We should raise an error if the chosen reference level has IDs that are not included in the feature table (even if they are included in the metadata. cc: @cherman2 as she discovered this error while we were running ANCOM-BC on one of her datasets.
Bug Description When a reference level is chosen that is a column within the metadata, but the associated IDs are not in the feature table, no error is raised - ANCOM-BC just defaults back to alphabetical order for the intercept column. We should raise an error for this, as it produces results that aren't accurate to the user for their provided inputs.
Example Metadata File:
Example Feature Table:
Example Command:
In this example, Column1::group1 was chosen as the reference level, but the sample ID S001 is not included in the feature table, and is thus not included in the actual analysis. This causes the reference level behavior to default back to alphabetical order for the chosen formula column, meaning that group2 is selected as the intercept (i.e. reference level) instead of group1. This would produce the following differential table:
This produces a confusing output for users because they are expecting the (Intercept) column to be
group1
and for there to be two additional columns (group2
andgroup3
fromColumn1
). It is unclear from these results which column is used as the intercept (i.e. reference level) and why one of the columns seemingly disappeared.We should raise an error if the chosen reference level has IDs that are not included in the feature table (even if they are included in the metadata. cc: @cherman2 as she discovered this error while we were running ANCOM-BC on one of her datasets.