slfan2013 / Shiny-SERRF

10 stars 3 forks source link

Training data not being subsetted #3

Closed amnahsiddiqa closed 2 years ago

amnahsiddiqa commented 2 years ago

Hi Silli,

First of all thanks for developing this great tool for batch correction. its very helpful. However, I have just started to use it and ran into an error as screenshot attached below:

My understanding is that may be training data is not being subsetted according to line 506 in app.R

train_data_x = apply(e_current_batch[sel_var, train.index_current_batch=='qc'],1,scale)

I see there are two possibilities for this to happen: a) sel_var is empty (is this even possible as it is dependent on correlation?) or there are no enough qc samples. However, my data has enough number of qcs (> 10 at least) in each batch. Any help in debugging of this error for my data is appreciated.

Screen Shot 2022-10-10 at 7 39 54 AM

(Just on separate note- This tool runs fine on my smaller dataset; my bigger dataset seems to give me more troubles) Thanks, Amnah

amnahsiddiqa commented 2 years ago

Hi Silli,

I have figured it out. I did imputation myself of this data before doing SERRF and in one of the batches QC samples had top correlated features (to j) in training_data_x matrix with 9/10 features with zero variance (when scaled had NaNs in all values of course). This caused single feature upon subsetting by

train_data_x = train_data_x[,!train_NA_index]

turn into a vector and hence produce this error upon reaching here

good_column = apply(train_data_x,2,function(x){sum(is.na(x))==0})..

I hope it helps you to take care of this issue in your code as well by some warning message issued probably for users; Since I just started playing with the batch correction, I will take care of imputation and filtering steps in data (qc specifically) after understanding SERRF algorithm.

Thanks much :) Amnah