salzman-lab / SICILIAN

GNU General Public License v2.0
19 stars 11 forks source link

Too many prefixes in input argument "names" for ss2 data #15

Closed zyh4482 closed 2 years ago

zyh4482 commented 2 years ago

For smart-seq2, there are many files with different prefixes. Despite I can write a script to create such a list and assign it to input argument names, it is inconvinient and the list could be very large. May I ask how do you usually make the names list for ss2 data? Thanks

zyh4482 commented 2 years ago

I've successfully finished the test. I modified argument names with following script to automatically input fastq of every sample inside the same folder:

import re

names=[]
for i in os.listdir(data_path):
    portion = os.path.splitext(i)
    tmp_out=portion[0][0:portion[0].rfind('_R')]
    names.append(tmp_out)
names = sorted(list(set(names)))

May I ask an additional question here? I noticed that when doing GLM calculating, it reported many warning messages. For example:

Warning messages:
1: In `[.data.table`(class_input, , `:=`(c("cur_weight", "train_class",  :
  Column 'junc_cdf1_glmnet_twostep' does not exist to remove
2: In `[.data.table`(class_input, , `:=`(c("cur_weight", "train_class",  :
  Column 'refName_readStrandR1' does not exist to remove
3: In `[.data.table`(class_input, , `:=`(c("cur_weight", "train_class",  :
  Column 'refName_readStrandR2' does not exist to remove
4: In `[.data.table`(class_input, , `:=`(c("cur_weight", "train_class",  :
  Column 'gene_strandR1A_new' does not exist to remove
5: In `[.data.table`(class_input, , `:=`(c("cur_weight", "train_class",  :
  Column 'gene_strandR1B_new' does not exist to remove
Warning messages:
1: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glm' does not exist to remove
2: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glm_corrected' does not exist to remove
3: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glmnet' does not exist to remove
4: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glmnet_constrained' does not exist to remove
5: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glmnet_corrected' does not exist to remove
6: In `[.data.table`(class_input, , `:=`(c("junc_cdf_glm", "junc_cdf_glm_corrected",  :
  Column 'junc_cdf_glmnet_corrected_constrained' does not exist to remove
Warning messages:
1: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_glm' does not exist to remove
2: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_corrected' does not exist to remove
3: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_glmnet' does not exist to remove
4: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_glmnet_constrained' does not exist to remove
5: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_glmnet_corrected' does not exist to remove
6: In `[.data.table`(class_input, , `:=`(c("p_predicted_glm", "p_predicted_corrected",  :
  Column 'p_predicted_glmnet_corrected_constrained' does not exist to remove
Warning messages:
1: In `[.data.table`(GLM_output, , `:=`(frac_mutimapping, NULL)) :
  Column 'frac_mutimapping' does not exist to remove
2: In `[.data.table`(GLM_output, , `:=`(train, NULL)) :
  Column 'train' does not exist to remove

May I ask if these messages do harm to result?

Thank you.

roozbehdn commented 2 years ago

These warning messages are normal. Each time the script is run, it checks to make sure that these columns for which you got the warning are not in the input file and otherwise they will be removed. These warnings are nothing to be worried about and you should get the output files fine.