waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap2
MIT License
172 stars 15 forks source link

wfmash seems much slower in `-Y #` mode compared to `-X`, although the input follows the PanSN-spec #241

Open subwaystation opened 2 months ago

subwaystation commented 2 months ago
    Command being timed: "wfmash LPA.fa.gz LPA.fa.gz -p 70 -s 500 -X"
    User time (seconds): 14.26
    System time (seconds): 0.16
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.49
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 149116
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 8328
    Voluntary context switches: 34527
    Involuntary context switches: 44
    Swaps: 0
    File system inputs: 0
    File system outputs: 128
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
    Command being timed: "wfmash LPA.fa.gz LPA.fa.gz -p 70 -s 500 -Y #"
    User time (seconds): 47.30
    System time (seconds): 3.37
    Percent of CPU this job got: 103%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:49.07
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 182840
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 63761
    Voluntary context switches: 600318
    Involuntary context switches: 162
    Swaps: 0
    File system inputs: 0
    File system outputs: 2456
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

I would expect the exact same speed in both modes, because the input follows the PanSN-spec.

chm13#0#tig00000001 330243  21  60  61
chm1#0#tig00000003  227072  335789  60  61
HG002#0#tig00000001 329347  566667  60  61
HG002#1#tig00000005 274138  901525  60  61
HG00733#0#tig00000001   295824  1180255 60  61
HG00733#1#tig00000008   283680  1481033 60  61
HG01358#0#tig00000002   337324  1769464 60  61
HG01358#1#tig00000010   240282  2112434 60  61
HG02572#0#tig00000005   311357  2356744 60  61
HG02572#1#tig00000001   309189  2673314 60  61
NA19239#0#tig00000002   264003  2987680 60  61
NA19239#1#tig00000006   236974  3256107 60  61
NA19240#0#tig00000001   285146  3497054 60  61
NA19240#1#tig00000012   260090  3786976 60  61

I tested with the current master. Thanks for any feedback!