seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
392 stars 81 forks source link

Multi-threading, banding, other speed-ups #3255

Open jaysunl opened 1 month ago

jaysunl commented 1 month ago

Platform

Question

Can someone explain how to use multi-threading to gain a significant speed-up? My multi-threaded version seems to be slow than without multi-threading. An example of a code snippet helps (maybe with fasta files would help but a vector example also works). I tried following the example in the docs and the speed didn't improve for me.

rrahn commented 1 month ago

Hi @jaysunl can you please specify what you are trying to do? The best way to do this would be to give a minimal working example of what you are parallelizing and how you are doing it. Best regards

eseiler commented 1 month ago

I tried following the example in the docs and the speed didn't improve for me.

Looks like you forgot to reference the example you tried out?

jaysunl commented 1 month ago

Yes apologies, I tried these examples: multi-threading with callback

    using namespace seqan3::literals;
    using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
    auto start = std::chrono::high_resolution_clock::now();

    std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4, 
    "AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};

    auto alignment_config = seqan3::align_cfg::method_global{} 
                                | seqan3::align_cfg::scoring_scheme{
                                  seqan3::nucleotide_scoring_scheme{}}  
                                | seqan3::align_cfg::gap_cost_affine{} 
                                | seqan3::align_cfg::output_score{} 
                                | seqan3::align_cfg::output_alignment{}
                                | seqan3::align_cfg::parallel{4};
    std::mutex write_to_debug_stream{};
    auto const alignment_config_with_callback = alignment_config |
                                                seqan3::align_cfg::on_result{[&] (auto && result)
                                                {
                                                    std::lock_guard sync{write_to_debug_stream}; // critical section
                                                    //seqan3::debug_stream << result << '\n';
                                                }};
    seqan3::align_pairwise(sequences, alignment_config_with_callback);

and then multi-threading without callback

    using namespace seqan3::literals;
    using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
    auto start = std::chrono::high_resolution_clock::now();

    std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4, 
    "AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};

    auto alignment_config = seqan3::align_cfg::method_global{} 
                                | seqan3::align_cfg::scoring_scheme{
                                  seqan3::nucleotide_scoring_scheme{}}  
                                | seqan3::align_cfg::gap_cost_affine{} 
                                | seqan3::align_cfg::output_score{} 
                                | seqan3::align_cfg::output_alignment{}
                                | seqan3::align_cfg::parallel{4};
    seqan3::align_pairwise(sequences, alignment_config);

and then standard sequential procedure:

    using namespace seqan3::literals;
    using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
    auto start = std::chrono::high_resolution_clock::now();

    std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4, 
    "AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};

    auto alignment_config = seqan3::align_cfg::method_global{} 
                                | seqan3::align_cfg::scoring_scheme{
                                  seqan3::nucleotide_scoring_scheme{}}  
                                | seqan3::align_cfg::gap_cost_affine{} 
                                | seqan3::align_cfg::output_score{} 
                                | seqan3::align_cfg::output_alignment{}
       // notice no parallel specification
    seqan3::align_pairwise(sequences, alignment_config);

but all codes ran the same speed, and actually in some cases the parallelism slows down the code. I tried increasing the number of alignments and the thread count but this also doesn't do that much. Also sort of unrelated, but sometimes a local alignment is slower than a global alignment, which is weird to me. In addition, banding also doesn't speed the alignment time as much. Any tips?