VIPRA and MLEHaplo README file
Pre-requisites:
#################Preliminaries#########################
Command:
$ multi-k <fasta/fastq file>
file1.fastq file2.fastq
55 45 35 25
Output of multi-dsk is a collection of files with extension ".solid_kmers_binary.
Command:
parse_results ".solid_kmers_binary.
The file "fasta/fastq file.kvalue" now contains the k-mer counts for the fasta/fastq file in the format "k-mer count" per line.
Generating the graph needs two files and a parameter.
Command:
Combine paired files into a single file
perl construct_graph.pl
Output is the
Create the paired set using the paired reads. It takes as input the two paired files,
file1.fastq and file2.fastq
the k-mer counts file,
file1.kvalue
and a threshold for ignoring erroneous k-mers
Choice of threshold : Dependent on sequencing coverage. Lower threshold includes more erroneous k-mers in the graph, while higher threshold decreases the number of true k-mers and size of the graph.
Command:
perl construct_paired_without_bloom.pl -file1 file1.fastq -file2 file2.fastq -paired -kmerfile file1.kvalue -thresh
########################VIPRA######################
Running the VIPRA algorithm takes inputs generated above and a parameter for the average insert size, threshold parameter and a value for M (factor) which decides the number of paths to generate per vertex
Command:
perl dg_cover.pl -graph
outputfile contains the paths generated from the graph with high paired end supports
Step a: Generate fasta file
Extracting fasta file from outputfile
Command:
perl process_dg.pl
Output: fasta file of the paths generated by VIPRA
Step b: Generate paths file for maximum likelihood estimation
Extracting just the paths in terms of nodes in the graph
perl get_paths_dgcover.pl -f
Output:
########################MLEHaplo######################### Running MLEHaplo takes as input intermediate files generated by VIPRA and pathswritefile generated in Step b
Command:
perl likelihood_singles_wrapper_parallel.pl -condgraph file1.cond.graph -compset file1.comp.txt -pathsfile
the files cond.graph and comp.txt are outputs generated by VIPRA and contain the condensed graph and compatible sets respectively of the De Bruijn graph and PairedSets.
Final viral population generation using MLEtextfileoutput
Command:
perl extract_MLE.pl -f