lauringlab / variant_pipeline

Work on the variant_pipeline and initial r analysis used in calling variants from NGS data
Apache License 2.0
8 stars 13 forks source link

deepSNV.r generates output.csv files with different column order - why? #14

Open grendon opened 5 years ago

grendon commented 5 years ago

I run deepSNV.r three times and I only changed one parameter in each case. The order of the columns of the three output.csv files is different.

`Rscript --vanilla --slave deepSNV_R wsn33_wt_plasmid.fa \ HA-22_S7_L001.sorted.dedup.filtered.bam \ Plasmid-control_S49_L001.sorted.dedup.filtered.bam \ bonferroni \ 0.1 \ fisher \ one.sided \ 1.1 \ output.csv \ output.fa \ /.../src/R_lib

Rscript --vanilla --slave deepSNV_R wsn33_wt_plasmid.fa \ HA-22_S7_L001.sorted.dedup.filtered.bam \ Plasmid-control_S49_L001.sorted.dedup.filtered.bam \ bonferroni \ 0.1 \ fisher \ one.sided \ 0.1 \ output3.csv \ output3.fa \ /.../src/R_lib

Rscript --vanilla --slave deepSNV_R wsn33_wt_plasmid.fa \ HA-22_S7_L001.sorted.dedup.filtered.bam \ Plasmid-control_S49_L001.sorted.dedup.filtered.bam \ bonferroni \ 0.1 \ fisher \ one.sided \ 0.5 \ output4.csv \ output4.fa \ /.../src/R_lib

` The corresponding output files are these:

output.csv.txt output3.csv.txt output4.csv.txt

alauring commented 5 years ago

I'm not sure why that they are in different order, but they all have the same actual columns. I can't tell what you did differently, but changing the parameters will change who it runs through the pipeline. I suspect that we just didn't standardize how the data were arranged with the different setups. I'm sorry, but I can't really troubleshoot this for you.

grendon commented 5 years ago

I can send you my input files to see if you can reproduce the error. Because this is a bug in the sense that it is a problem when the rest of the scripts in your pipeline expect columns of this file to be in a specific order.

alauring commented 5 years ago

I am sorry, but I do not have the time to troubleshoot this for you. The output files you sent are the final files in the DeepSNV pipeline. We would then use simple R commands to subset these .csv files as documented in our manuscripts (McCrone and Lauring JVI 2016; Debbink et al. PLoS Pathogens 2017; McCrone et al. eLife 2018). For example, based on a certain quality score or the position of a variant in a read. Unless I am misunderstanding your question.