To run VerJInxer, you need to have log4j and log5j in your CLASSPATH. Both jar files can be found in the lib/ directory. I use something like this to simplify things:
alias vji="java -ea -d64 -Xmx8G -cp $HOME/workspace/verjinxer/lib/log5j-1.2.jar:$HOME/workspace/verjinxer/lib/log4j-1.2.jar:$HOME/workspace/verjinxer/bin/ verjinxer.Main"
To work with sequence data, it must first be translated from FASTA format.
vji tr --dna refseq.fa
vji tr --dna queries.fa
This creates projects 'refseq' and 'queries'. All further commands use these names as parameter.
Create a q-gram index of a reference sequence
vji qg refseq
Create a bisulfite q-gram index of a reference sequence
vji qg --bisulfite refseq
Match queries against a q-gram index. Also works for a bisulfite q-gram index.
vji qm queries refseq
Same with minimum match length of 25 and a filter:
vji qm -l 25 -F 2:1 queries refseq
The filter is specified as "2:1", which means that all q-grams are ignored for matching that consist of only two alphabet characters. The one means that one additional character is allowed that is different.
Create a q-gram index of a query sequence
vji qg queries
Match a reference sequence against a q-gram index of queries
vji qm refseq queries
If the index contains bisulfite-treated sequences, then simulation of bisulfite treatment of the reference sequence should be requested:
vji qm --bisulfite refseq queries
In the following, positions and sequence indices are 0-based (the first position in a sequence has index 0; the first sequence in a set of sequences has index 0).
This file is written during a 'qmatch' run. It contains all exact matches between a set of query and a set of reference sequences. Each line contains six integer numbers, separated by space, that represent a single exact match:
query_number query_position reference_number reference_position length diagonal
This file is written during an 'align' run. It contains all matches between a set of query sequences and a set of reference sequences, where matches are those in which the entire query sequence could be aligned to a location on the reference sequence with a given maximum number of errors, or with a minimum score.
Each line is as follows:
query_number reference_number reference_start_position reference_stop_position errors/score