refresh-bio / SPLASH

GNU General Public License v3.0
70 stars 9 forks source link

Column names missing in anchors file? #36

Open joegeorgeson opened 1 month ago

joegeorgeson commented 1 month ago

Hi SPLASH,

updated from my initial post

I've started to run your pipeline and am now stuck on the SPLASH_extendor_classification.R and am troubleshooting line-by-line.

Using the parameter which_anchors_file = "after_correction" I can reach the below line, when most_freq_target_3 is not found.

> RNA_editing = anchors[,list(anchor,target,anchor_count,target_count,most_freq_target_3,most_freq_target_4,cnt_most_freq_target_3,cnt_most_freq_target_4)]
Error in eval(jsub, SDenv, parent.frame()) : 
  object 'most_freq_target_3' not found

Here are the colnames of the anchors object. Before this step there were no errors or warnings.

> colnames(anchors)
 [1] "anchor"                              "pval_opt"                           
 [3] "effect_size_bin"                     "pval_base"                          
 [5] "anchor_count"                        "num_extendor_per_anchor"            
 [7] "number_nonzero_samples"              "target_entropy"                     
 [9] "avg_no_homopolymer_targets"          "avg_hamming_distance_max_target"    
[11] "avg_hamming_distance_all_pairs"      "avg_edit_distance_max_target"       
[13] "avg_edit_distance_all_pairs"         "anchor_2mer_seq_entropy"            
[15] "anchor_3mer_seq_entropy"             "target"                             
[17] "target_count"                        "most_freq_target_1_2mer_seq_entropy"
[19] "most_freq_target_1_3mer_seq_entropy" "most_freq_target_2_2mer_seq_entropy"
[21] "most_freq_target_2_3mer_seq_entropy" "pval_opt_corrected"                 
[23] "extendor"                            "extendor_order"                     
[25] "anchor_index"                        "ham_dist"                           
[27] "lev_dist"                            "lev_operations"                     
[29] "lcs_operations"                      "run_length_D"                       
[31] "run_length_I"   

Can you adise?

Thanks, Joe

roozbehdn commented 1 week ago

Hi @joegeorgeson , sorry for delay in responding to this. Can you please paste in the first few lines of the splash output file result.after_correction.scores.tsv you used for running the script. For RNA editing block, you need to have 4 targets for each anchor. i.e., you should see columns most_freq_target_3 and most_freq_target_4 in your result.after file.