Open jaysunl opened 6 months ago
Hi @jaysunl can you please specify what you are trying to do? The best way to do this would be to give a minimal working example of what you are parallelizing and how you are doing it. Best regards
I tried following the example in the docs and the speed didn't improve for me.
Looks like you forgot to reference the example you tried out?
Yes apologies, I tried these examples: multi-threading with callback
using namespace seqan3::literals;
using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
auto start = std::chrono::high_resolution_clock::now();
std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4,
"AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};
auto alignment_config = seqan3::align_cfg::method_global{}
| seqan3::align_cfg::scoring_scheme{
seqan3::nucleotide_scoring_scheme{}}
| seqan3::align_cfg::gap_cost_affine{}
| seqan3::align_cfg::output_score{}
| seqan3::align_cfg::output_alignment{}
| seqan3::align_cfg::parallel{4};
std::mutex write_to_debug_stream{};
auto const alignment_config_with_callback = alignment_config |
seqan3::align_cfg::on_result{[&] (auto && result)
{
std::lock_guard sync{write_to_debug_stream}; // critical section
//seqan3::debug_stream << result << '\n';
}};
seqan3::align_pairwise(sequences, alignment_config_with_callback);
and then multi-threading without callback
using namespace seqan3::literals;
using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
auto start = std::chrono::high_resolution_clock::now();
std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4,
"AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};
auto alignment_config = seqan3::align_cfg::method_global{}
| seqan3::align_cfg::scoring_scheme{
seqan3::nucleotide_scoring_scheme{}}
| seqan3::align_cfg::gap_cost_affine{}
| seqan3::align_cfg::output_score{}
| seqan3::align_cfg::output_alignment{}
| seqan3::align_cfg::parallel{4};
seqan3::align_pairwise(sequences, alignment_config);
and then standard sequential procedure:
using namespace seqan3::literals;
using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
auto start = std::chrono::high_resolution_clock::now();
std::vector<sequence_pair_t> sequences{100000, {"CAGGCATGAGCCACTACTCCTGTTTTTTAGAGGATATAGATAGAATGGATCCTGTGTCCCATAATAAATTAAGGGCAACTTGTCACACCCCTTCCATACAAAGACTGAATCAGCAGACACCACAGCCAAATCAGAGGGAAGGATGGCATGGGCTTGCTTGGTTAAGCAACAGAATAACAGCAATAATAACATAAATATAATTGCAATTTATGAGTTCTTGTTATTTGCCAGGTTCTGTAATTAATGCCATCATTAC"_dna4,
"AATACCTGTTTTTAGAGGTATAGTAATAGAGTAGATGTGCCTCCCATAATAAATAGGGCTACTTGTACAAATACCCACCTTCCAACAAAGGACCTAATCAGCAGACACAAGAGCCAAAGCAGAGCGAAGGAATGCACATGGGCTTAGCTTGTAAAGCAAAGAGTAACAGCAAAAAATCATAAATTAAATTTCCAATTTAGGTTCATTTCATTGCCAGGTATCGAATCAATGGCTGATATTACTATCTACTTTTTGT"_dna4}};
auto alignment_config = seqan3::align_cfg::method_global{}
| seqan3::align_cfg::scoring_scheme{
seqan3::nucleotide_scoring_scheme{}}
| seqan3::align_cfg::gap_cost_affine{}
| seqan3::align_cfg::output_score{}
| seqan3::align_cfg::output_alignment{}
// notice no parallel specification
seqan3::align_pairwise(sequences, alignment_config);
but all codes ran the same speed, and actually in some cases the parallelism slows down the code. I tried increasing the number of alignments and the thread count but this also doesn't do that much. Also sort of unrelated, but sometimes a local alignment is slower than a global alignment, which is weird to me. In addition, banding also doesn't speed the alignment time as much. Any tips?
Platform
Linux raptor.ucsd.edu 5.4.0-149-generic #166-Ubuntu SMP Tue Apr 18 16:51:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Question
Can someone explain how to use multi-threading to gain a significant speed-up? My multi-threaded version seems to be slow than without multi-threading. An example of a code snippet helps (maybe with fasta files would help but a vector example also works). I tried following the example in the docs and the speed didn't improve for me.