vanheeringen-lab / ANANSE

Prediction of key transcription factors in cell fate determination using enhancer networks. See full ANANSE documentation for detailed installation instructions and usage examples.
http://anansepy.readthedocs.io
MIT License
77 stars 16 forks source link

YEASH! O(n^2) better AT LEAST! #169

Closed siebrenf closed 2 years ago

siebrenf commented 2 years ago

In ANANSE influence we calculated the target scores, based on the interaction distance and the weight (the interaction probability from ANANSE network), using something like this:

targets = nx.single_source_dijkstra(grn, node, cutoff=2)
for target in targets:
    for path in nx.all_simple_paths(grn, node, target, cutoff=2):
        ...

This monstrosity was required because we previously used a scoring function that takes the total path length as variable. This function guarantees we find the global maximum, but required this brute force approach AND limits the maximum interaction distance due to computational complexity.

In this PR

I applied a weight function that :

The weight-function might not yield the exact same paths. However, I bench marked the functions with 100k edges (rounding the results to integers) and found no differences.

Why this matters

Benchmark results

same with 500k edges (column b was replaced with a faster check). Bonus stats: Column a took 18745 sec (98% of which were the 14.910.634 calls to nx.all_simple_paths). Columns c took 5 sec.

tf  a   c
sox4b   1114    1114
tfap2b  0   0
foxq1a  441 441
pax2a   670 670
mef2cb  402 402
klf2b   209 209
emx2    528 528
hif1ab  796 796
sp6 1   1
foxq1b  363 363
sp5a    322 322
tcf15   323 323
zeb2b   323 323
zfhx3   629 629
six3a   407 407
nr2f2   884 884
hoxc9a  480 480
tcf7l2  469 469
neurod1 914 914
ewsr1a  610 610
hoxc5a  400 400
foxf1   485 485
atf4b   534 534
maff    0   0
sox1a   725 725
jund    410 410
pitx2   544 544
lef1    342 342
hoxa5a  548 548
mecom   361 361
eno1a   1   1
nr2c2   480 480
arid5b  443 443
pax6b   440 440
tfap2e  209 209
pbx3b   1386    1386
prox1a  577 577
foxd3   805 805
rfx7a   0   0
hoxb6a  526 526
meis1b  902 902
neurod6b    646 646
sox10   787 787
hoxb10a 439 439
bhlhe23 461 461
znf143a 524 524
sox6    808 808
cebpg   682 682
hbp1    323 323
foxi3b  461 461
ebf3a   953 953
otpb    443 443
hoxd3a  623 623
vtnb    416 416
usf1    794 794
sox1b   739 739
myog    521 521
nr2f1a  880 880
tbx1    446 446
crx 449 449
sox4a   1276    1276
lhx5    616 616
foxo3b  341 341
zbtb17  462 462
xbp1    778 778
cebpd   1   1
hif1al  864 864
pax8    2   2
tfap2c  355 355
hoxa10b 415 415
gata1a  0   0
cux1a   592 592
klf5l   366 366
bcl11ba 588 588
eno3    6   6
tal1    474 474
otx5    431 431
foxb1a  543 543
sox9a   933 933
klf6a   768 768
egr1    223 223
hoxa9b  500 500
hoxb9a  594 594
hoxc4a  620 620
rxrgb   443 443
gata3   1184    1184
nr2e3   565 565
sp2 525 525
thap3   324 324
snai1a  0   0
lhx1a   667 667
pbx2    1209    1209
hoxb8a  7   7
sox5    481 481
pbx1a   1079    1079
mef2d   448 448
vsx1    430 430
six6b   415 415
srebf2  822 822
sp9 551 551
tp63    324 324
klf2a   208 208
hlx1    353 353
foxf2a  609 609
rarga   8   8
prdm1a  429 429
klf11a  737 737
lhx4    416 416
jun 417 417
osr1    515    515
nfybb   682 682
bhlhe22 366 366
tcf12   836 836
junbb   534 534
codeclimate[bot] commented 2 years ago

Code Climate has analyzed commit db979fd9 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 58.2% (0.0% change).

View more on Code Climate.

simonvh commented 2 years ago

I suck at git, so I continued this in #171. Closing.