rvaser / spoa

SIMD partial order alignment tool/library
MIT License
158 stars 32 forks source link

Affine gap penalties #15

Closed ksahlin closed 5 years ago

ksahlin commented 5 years ago

Hi again,

[With the risk of asking something obvious once reading the POA-paper] Is it possible (or practically feasible) implement affine gap penalty when aligning a sequence to the graph?

rvaser commented 5 years ago

This was implemented in a previous version but I omitted it after as we got similar results with less resources in racon. You can try commit a6dd47a. If it seems better, I can reinstate it.

ksahlin commented 5 years ago

ok, will do and I'll get back to you on that.

ksahlin commented 5 years ago

I did git checkout a6dd47a and followed the installation instructions (removed previous build folder) and got:

CSE-mkspm01:build kxs624$ cmake -DCMAKE_BUILD_TYPE=Release ..
-- The C compiler identification is AppleClang 7.3.0.7030031
-- The CXX compiler identification is AppleClang 7.3.0.7030031
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/kxs624/Documents/workspace/spoa/build
CSE-mkspm01:build kxs624$ make
Scanning dependencies of target spoa
[ 14%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
/Users/kxs624/Documents/workspace/spoa/src/alignment_engine.cpp:26:9: error: use of undeclared identifier 'exit'
        exit(1);
        ^
1 error generated.
make[2]: *** [CMakeFiles/spoa.dir/src/alignment_engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2
rvaser commented 5 years ago

Add #include <stdlib.h> on top of src/alignment_engine.cpp.

ksahlin commented 5 years ago

ok did that. Build works fine, but run into this in the make step:

CSE-mkspm01:build kxs624$ make
Scanning dependencies of target spoa
[ 14%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
[ 28%] Building CXX object CMakeFiles/spoa.dir/src/chain.cpp.o
In file included from /Users/kxs624/Documents/workspace/spoa/src/chain.cpp:9:
/Users/kxs624/Documents/workspace/spoa/src/chain.hpp:54:23: error: no type named 'FastaReader' in namespace 'bioparser'
    friend bioparser::FastaReader<Chain>;
           ~~~~~~~~~~~^
/Users/kxs624/Documents/workspace/spoa/src/chain.hpp:54:34: error: expected member name or ';' after declaration specifiers
    friend bioparser::FastaReader<Chain>;
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/Users/kxs624/Documents/workspace/spoa/src/chain.hpp:55:23: error: no type named 'FastqReader' in namespace 'bioparser'
    friend bioparser::FastqReader<Chain>;
           ~~~~~~~~~~~^
/Users/kxs624/Documents/workspace/spoa/src/chain.hpp:55:34: error: expected member name or ';' after declaration specifiers
    friend bioparser::FastqReader<Chain>;
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4 errors generated.
make[2]: *** [CMakeFiles/spoa.dir/src/chain.cpp.o] Error 1
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2
rvaser commented 5 years ago

Try git submodule update vendor/bioparser from spoa root. Clear build folder and run cmake and make again.

ksahlin commented 5 years ago

I got it to run with your suggestions, thanks! I cannot evaluate the performance difference at this point as this commit does not output the graph (-d option). I would think affine would do a lot better since we are working with exon differences (i.e., larger deletions of segments), so affine would really make sense, however, at this point it is really only my intuition.

As for integrating this commit with the current version (that supposts fastq and prints dot-graph) --- It would have to be your call. I' not sure about what goes on in the code under the hood, but maybe the affine alignment could be a separate module that would not be mixed in with the default non-affine spoa code if the user decided to run as default. I definitely don't think it should be at the cost of resources for other types of alignment.

I cannot offer you anymore than a strong intuition at this point that affine gaps would help, but i could also evaluate this if the -d option was somehow introduced.

rvaser commented 5 years ago

Integration won't be that much of a problem I guess. But, could you first do a couple of quick C/P and try to enable graph plots in that commit? I'll provide everything you need.

ksahlin commented 5 years ago

Awesome, the dot graph would serve as a basis to check if, in large, the paths makes more sense in an affine setting.

rvaser commented 5 years ago

Here are the steps:

  1. Replace https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/graph.cpp#L621-L654 with https://github.com/rvaser/spoa/blob/master/src/graph.cpp#L683-L725
  2. Replace https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/include/spoa/graph.hpp#L69 with https://github.com/rvaser/spoa/blob/master/include/spoa/graph.hpp#L81
  3. Add {"dot", required_argument, 0, 'd'}, before https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/main.cpp#L15
  4. Add std::string dot_path = ""; at https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/main.cpp#L37
  5. Add d: at the end of string at https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/main.cpp#L39
  6. Add graph->print_dot(dot_path); at https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/main.cpp#L139
  7. Add the bellow code before https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/main.cpp#L70
    case 'd':
        dot_path = optarg;
        break;
rvaser commented 5 years ago

Also, FASTQ is supported with -q option in that commit.

ksahlin commented 5 years ago

Hmm I think I got the steps correct but I ran into an error with make. Could the error be on my end?

CSE-mkspm01:build kxs624$ make
Scanning dependencies of target spoa
[ 14%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
[ 28%] Building CXX object CMakeFiles/spoa.dir/src/chain.cpp.o
[ 42%] Building CXX object CMakeFiles/spoa.dir/src/graph.cpp.o
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:629:19: error: implicit instantiation of undefined template 'std::__1::basic_ofstream<char, std::__1::char_traits<char> >'
    std::ofstream out(path);
                  ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/iosfwd:133:33: note: template is declared here
    class _LIBCPP_TYPE_VIS_ONLY basic_ofstream;
                                ^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:637:57: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
    out << "digraph " << num_sequences_ << " {" << std::endl;
                                                   ~~~~~^~~~
                                                        end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:638:45: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
    out << "    graph [rankdir=LR]" << std::endl;
                                       ~~~~~^~~~
                                            end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:645:28: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
        out << "]" << std::endl;
                      ~~~~~^~~~
                           end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:653:32: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
            out << "]" << std::endl;
                          ~~~~~^~~~
                               end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:658:70: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
                out << " [style = dotted, arrowhead = none]" << std::endl;
                                                                ~~~~~^~~~
                                                                     end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
/Users/kxs624/Documents/workspace/spoa/src/graph.cpp:662:24: error: no member named 'endl' in namespace 'std'; did you mean 'end'?
    out << "}" << std::endl;
                  ~~~~~^~~~
                       end
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:109:1: note: 'end' declared here
end(initializer_list<_Ep> __il) _NOEXCEPT
^
7 errors generated.
make[2]: *** [CMakeFiles/spoa.dir/src/graph.cpp.o] Error 1
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2
rvaser commented 5 years ago

Ugh, forgot one header. Add #include <fstream> on top of src/graph.cpp (and maybe #include <iostream>).

iminkin commented 5 years ago

I also need the affine gaps aligner for my project. When trying to compile the mentioned revision with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609 I get the following:

[ 14%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o [ 28%] Building CXX object CMakeFiles/spoa.dir/src/chain.cpp.o [ 42%] Building CXX object CMakeFiles/spoa.dir/src/graph.cpp.o [ 57%] Building CXX object CMakeFiles/spoa.dir/src/main.cpp.o [ 71%] Building CXX object CMakeFiles/spoa.dir/src/simd_alignment_engine.cpp.o In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/xmmintrin.h:1249:0, from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:31, from /usr/include/x86_64-linux-gnu/c++/5/bits/opt_random.h:33, from /usr/include/c++/5/random:50, from /usr/include/c++/5/bits/stl_algo.h:66, from /usr/include/c++/5/algorithm:62, from /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp:8: /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp: In static member function 'static spoa::mxxxi spoa::InstructionSet::_mmxxx_slli_si(const __mxxxi&)': /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp:143:16: error: the last argument must be an 8-bit immediate return _mm_slli_si128(a, sizeof(type)); ^ /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp: In static member function 'static spoa::mxxxi spoa::InstructionSet::_mmxxx_srli_si(const mxxxi&)': /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp:146:16: error: the last argument must be an 8-bit immediate return _mm_srli_si128(a, (kNumVar - 1) * sizeof(type)); ^ /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp: In static member function 'static spoa::__mxxxi spoa::InstructionSet::_mmxxx_slli_si(const mxxxi&)': /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp:170:16: error: the last argument must be an 8-bit immediate return _mm_slli_si128(a, sizeof(type)); ^ /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp: In static member function 'static spoa::__mxxxi spoa::InstructionSet::_mmxxx_srli_si(const __mxxxi&)': /research/ium125/tmp/spoa/src/simd_alignment_engine.cpp:173:16: error: the last argument must be an 8-bit immediate return _mm_srli_si128(a, (kNumVar - 1) * sizeof(type)); ^ CMakeFiles/spoa.dir/build.make:158: recipe for target 'CMakeFiles/spoa.dir/src/simd_alignment_engine.cpp.o' failed make[2]: [CMakeFiles/spoa.dir/src/simd_alignment_engine.cpp.o] Error 1 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/spoa.dir/all' failed make[1]: [CMakeFiles/spoa.dir/all] Error 2 Makefile:127: recipe for target 'all' failed make: *** [all] Error 2

rvaser commented 5 years ago

Try replacing the following:

  1. https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/simd_alignment_engine.cpp#L143 with return _mm_slli_si128(a, 2);

  2. https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/simd_alignment_engine.cpp#L146 with return _mm_srli_si128(a, 14);

  3. https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/simd_alignment_engine.cpp#L170 with return _mm_slli_si128(a, 4);

  4. https://github.com/rvaser/spoa/blob/a6dd47a3ccaf0c12e481b3e2476eea08fd97813d/src/simd_alignment_engine.cpp#L173 with return _mm_srli_si128(a, 12);

iminkin commented 5 years ago

@rvaser thanks seems to work!

ksahlin commented 5 years ago

I made some preliminary tests and affine gap penalties definitely works better than the default when aligning transcripts. I would vote for this feature being included in some way.

rvaser commented 5 years ago

I will try to push affine gaps to master next week.

rvaser commented 5 years ago

Affine gaps are available on branch temp (SISD version only, edit CMakeLists to compile without SIMD). SIMD version coming soon.

rvaser commented 5 years ago

@ksahlin, @ilyaminkin, affine gaps are now live on master branch (v2.0.0). I sped it up by applying prefix max on rows.

rvaser commented 5 years ago

Affine gaps had some bugs which should be fixed now in v2.0.1.

ksahlin commented 5 years ago

Hi again!

I had the chance to try out the new commit (v2.0.2). Sorry for being a pain...but I was quite happy with being able to set -e 0 in the first affine version you implemented. Is there a good reason why you removed this setting?

rvaser commented 5 years ago

By first version you mean v2.0.0 or the earlier commit?

ksahlin commented 5 years ago

The earlier commit.

On a positive note FYI: I do se slightly improved alignments with the latest version vs the earlier commit with identical alignment parameters, so maybe -e 0 combined with latest version will give the holy grail answer :)

rvaser commented 5 years ago

Enabled in latest commit (v2.0.3).