Wrong group labels in PCA caused by mismatch of sample order in the table and design

tbaccata / amica

amica: an interactive and user-friendly web-based platform for the analysis of proteomics data

GNU General Public License v3.0

24 stars 8 forks source link

Wrong group labels in PCA caused by mismatch of sample order in the table and design #33

Open hollenstein opened 1 week ago

hollenstein commented 1 week ago

Hi Sebastian,

we've noticed an issue with the groups in the PCA plot being wrongly labeled (and possible other plots but we didn't check all of them). The cause of the issue is when the order of samples in the design does not reflect the order of the samples in the table. I know that this violates the requirements of the design file as the info says:

"The sample names in the samples column need to match the column names of the input file in the order of the input file."

I still wanted to ask if you would be willing to allow a mismatch of sample order in the design and the table or not.

Best, David

tbaccata commented 1 week ago

Hi David,

Thanks for letting me know! This will affect quite a lot of the downstream analysis and outputs. It's written in the requirements, but a sanity check/alternative mapping would probably be a good idea.

Best, Sebastian

xeniorn commented 1 week ago

Dear Sebastian,

looking at the code, it seems to me like it would be possible to avoid this requirement without changing any of the analysis and output code, by putting the columns in the correct order at input time, when constructing the ProteomicsData object, in "readInAmicaSumm" and equivalent functions, where you are anyhow iterating through all columns, and the function is already taking both design and data as input.

Does this make sense? Or are the input files used elsewhere directly?

Best,

tbaccata commented 1 week ago

Hey Juraj,

Yes, exactly - the input files aren't used anywhere else after reading in.

Best, Sebastian