Open benyoung93 opened 3 months ago
Also, additional thing I have just realised, you need to identify your column and + 1 as the awk takes into account the rowname column in the output from write.table
. It may be prudent therefore to put a row.names=F
in the r script, or for the awk it would need to be $6 =<0.01 and $3 =>100 for q-value and length respectively
Hi there :)
First of all, a wonderful collection of tools and a pipeline !! I have had trouble with some if the installation but I am just chopping and cutting the pipeline to fit my needs.
A quick query for the Virfinder step. So I know you want to do q-value < 0.01 and length > 1000bp. I have run the virfinder rscript file and got the output folder. I just want to double check that the filtering awk script is correct.
head P10_virfinder.tsv
The subsequent awk script is then this obviously (ignore all mu '"'"'" this is for a loop script that generates jobs for all of my samples).
So this awk script is currently firstly using awk on the fourth \t column which is p-value ($4), should this not be $5 as the <0.01 filtering is for q-value ?
Secondly, second awk is extracting the fourth instance after underscore, looking at the output this doesn't seem to be correct. If you are wanting length >1000 would it not be a better idea to do
awk -F'\t' '"'"'{ if ($2 >= 1000) print }'"'"'
Looking at the
virfinder
github it seems the original results format would work with your awk script, but I think a new update may of switched this ?I used the mamba install in the mudoger install scripts but edited a wee bit (
mamba create -n virfinder_env -c bioconda r-virfinder
) so I believe both my local and the mudogoer install versions would be the same.Happy to provide more info if needed, and I apologise in advance if I am barking up the wrong tree, but I was doing some QC and testing and noticed that the awk was not approproate for the output file generated by
virfinder
:).Ben