jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
241 stars 34 forks source link

Documentation #5

Closed Lagkouvardos closed 8 years ago

Lagkouvardos commented 8 years ago

This work looks very promising but the documentation is a litle scarce. Where it explains the different options for funcname? I am trying to do global nucleotide alignment not considering flanking gaps (16S rRNA read against full sequence).

jeffdaily commented 8 years ago

Thanks for letting me know about the documentation. I will see if I can improve it. It might be time for me to utilize github's wiki capabilities since the front page readme is getting rather long.

I refer to global, semi-global (ends-free), and local alignments as the "class" of alignment in the readme. I haven't heard the term "flanking gaps" before but am I correct to assume this means ends-free, where the gaps before and/or after both the query and database sequences aren't penalized? This would be the "sg" class of alignment.

Keep in mind, all functions have the same function signature.

parasail_result_t* the_parasail_function_name(
        const char * const restrict s1, const int s1Len,
        const char * const restrict s2, const int s2Len,
        const int open, const int gap,
        const parasail_matrix_t* matrix);

Might I suggest a function name of parasail_sg_striped_16? This would perform semi-global alignment (sg), using Farrar's "striped" vectorization, using 16-bit integers during the computation. The parasail_result_t pointer would contain the score as well as the end_query and end_ref locations (zero-based, not one-based).

Please send any more questions to me. I hope we can find a solution that works for you.

jeffdaily commented 8 years ago

Sorry, I did not give you enough detail for using parasail with nucleotide sequences.

Parasail was initially designed for protein sequences but it works just as well with nucleotide sequences if you use the correct scoring matrix. There are a number of predefined scoring matrices if you look in the parasail/matrices directory -- but these are all for proteins. You will need to use a function to create a simpler match/mismatch matrix.

/** Create simple substitution matrix. */
extern PARASAIL_API
parasail_matrix_t* parasail_matrix_create(
        const char *alphabet, const int match, const int mismatch);

/** Deallocate substitution matrix. */
extern PARASAIL_API
void parasail_matrix_free(parasail_matrix_t *matrix);

You would then use that matrix in later parasail alignment routines. There is some code using these functions in the parasail_aligner.cpp application if you need a reference.

Lagkouvardos commented 8 years ago

Thanks Jeff for the quick answer. Yes I was referring to ends-free DNA alignment.

Just to understand better though, your goal here is to offer a tool or a library for others to use? As a non C programmer can I use it? In your answer you were instructing me how to modify the code? I thought that the option -d in the aligner allows for DNA matrix alignment.

Further more my attempt to build from source led to a non functional binary for parasail_aligner .

At the downloaded git folder I run ...

./configure 
make
sudo make install

I havent seen something alarming on the output but when I am trying to test the program I am getting ..

$ parasail_aligner -a parasail_sg_striped -d -M 1 -X -1 -f mySeqs.fasta
parasail_aligner: error while loading shared libraries: libparasail.so.1: cannot open shared object file: No such file or directory

What did I do wrong? Is it possible to offer pre-compiled binaries for Ubuntu or Windows ?

Thanks again for the prompt response

jeffdaily commented 8 years ago

Sorry I did not get back to you very timely after your last comment.

Parasail is primarily a C library that I hope will be incorporated into other tools, as needed. However, to try and demonstrate it's capabilities and offer users a ready-made application, I do provide the parasail_aligner application. So yes, you don't need to be a programmer to use the application. I wrongly assumed you were going to use it as a library.

Yes, using the parasail_aligner application, the -d option allows for DNA alignment. In that case, you'll also want to specify the -M match and -X mismatch scoring parameters, as well as the -o gap open and -e gap extension penalties if the defaults aren't sufficient.

The error you see is because your shared library loader is not able to find the parasail shared library. You can either 1) add your install prefix, e.g., /usr/local/lib, to your LD_LIBRARY_PATH environment variable, or 2) reconfigure parasail to use static linking, e.g., configure --disable-shared --enable-static, in which case the parasail library is entirely linked into the parasail_aligner executable.

Let me know if either of those options helped.

Lagkouvardos commented 8 years ago

This fixed the missing library error. Thanks for the help.

You can now close the issue if you want (although documentation can never be complete), or leave it for other users like me that might have similar issues.

As an end user I want to repeat how much is needed a stand alone command line fast global pairwise aligner supporting ends-free scoring. For all of us working with 16S rRNA amplicons that want to compare them with existing full length clones there is no serious option available. keep up the good work and I will look forward seen the standalone binary in a publication of its own.

Thanks again