isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

Effect of -F option #19

Closed rrwick closed 8 years ago

rrwick commented 8 years ago

I noticed odd behaviour with the -F option, and I'm not sure if this is a bug or if I'm misinterpreting the algorithm. My understanding was that a higher -F value would potentially lead to more alignments as it lowered the score threshold, and a value of 1.0 would potentially give the most alignments, effectively not using a score threshold. But I've encountered a case where increasing the -F value reduces my alignments.

Here are some example files to replicate what I'm seeing. contig.fasta.txt - 3 kb reference reads.fastq.txt - two reads

This command produces two alignments (one for each read): graphmap -r contig.fasta.txt -d reads.fastq.txt -o out.sam -Z -F 0.9

Increasing the -F value to 0.99 produces only one alignment: graphmap -r contig.fasta.txt -d reads.fastq.txt -o out.sam -Z -F 0.99

And an even higher -F value results in no alignments: graphmap -r contig.fasta.txt -d reads.fastq.txt -o out.sam -Z -F 0.999

I only see this behaviour with the default anchor algorithm. The anchorgotoh algorithm behaves more as expected, where increasing -F value towards 1.0 results in a third alignment (two for one read). I'm using the 2e314e6 commit (most current of the extanchorend branch).

isovic commented 8 years ago

Hi rrwick,

let me inspect this and I'll get back to you shortly!

Best regards, Ivan.

isovic commented 8 years ago

Hey rrwick!

I'm sorry it took a bit longer to get back to you on this. I inspected the issue you reported - indeed there was something odd. A new release of GraphMap has just been packaged up - in this version major chunks of code were updated/changed (including the one responsible for this behavior), and the issue you reported should be fixed. I have also included the fixes for your Issue #18 in this new release.

Could you give the new version (master branch) a shot? The branch I created for you earlier (extanchorend) is now obsolete.

Best regards, Ivan.

rrwick commented 8 years ago

Ivan,

Thanks for this. I gave it a shot and ran into two issues. First, building on the Mac failed with seqlib. It seems that older versions of zlib.h don't have gzFile_s while newer versions do. The version that Xcode provides on the Mac (1.2.5) doesn't have it. This meant that compilation of codebase/seqlib/src/sequences/sequence_file.cc failed on my computer.

I was able to get around this by tweaking the sequence_file files to use gzFile instead, which allowed the build to succeed:

codebase/seqlib/src/sequences/sequence_file.h:
@@ -219,7 +219,7 @@
-  int ReadGZLine_(gzFile_s *gzip_fp, std::string &ret);
+  int ReadGZLine_(gzFile gzip_fp, std::string &ret);

codebase/seqlib/src/sequences/sequence_file.cc:
@@ -392,7 +392,7 @@
-int SequenceFile::ReadGZLine_(gzFile_s *gzip_fp, std::string &ret) {
+int SequenceFile::ReadGZLine_(gzFile gzip_fp, std::string &ret) {

The other issue I encountered was with GraphMap owler. Here are the files so you can try yourself: contig.fasta.txt reads.fastq.txt

All three of those reads should align to that contig. In the previous version I used (the 2e314e6 commit), GraphMap owler found all three alignments. Now in the current commit, GraphMap owler doesn't find them (but GraphMap align does).

Previous version aligns the reads: graphmap -w owler -r contig.fasta -d reads.fastq -o out Current version aligns the reads in align mode: graphmap align -r contig.fasta -d reads.fastq -o out Current version does not align the reads in owler mode: graphmap owler -r contig.fasta -d reads.fastq -o out

I'm not sure if this is a bug or if the default behaviour has changed.

Ryan

isovic commented 8 years ago

Thank you for reporting this! I'm inspecting it, and will get back to you shortly.

isovic commented 8 years ago

Fixed on master branch, The default value for one parameter was invalid - an oversight due to command line parsing reimplementation.

Thanks again!

rrwick commented 8 years ago

I've built the new version and it's working fine. I appreciate the quick responses!

isovic commented 8 years ago

Great, thanks! I reopened this to remember to implement the gzFile_s fix you suggested.

isovic commented 8 years ago

I changed gzFile_s* to gzFile per your suggestions, thanks! It's on the master branch now.

make modules  
make