illinois-or-research-analytics / cm_pipeline

Pipeline that uses an improved version of CM for generating well-connected clusters
GNU General Public License v3.0
5 stars 4 forks source link

Pipeline Fails at cm_stage #59

Closed chackoge closed 1 month ago

chackoge commented 1 month ago

I installed the cm_pipeline in an python 3.10 venv. On a test run on valhalla it failed at the cm_stage. The filtered output yields 9 clusters. My pipeline.json file is pasted in below

x[,.N,by=V2][order(V2)] V2 N

1: 0 23 2: 1 19 3: 2 14 4: 3 12 5: 4 12 6: 5 12 7: 6 11 8: 7 11 9: 8 11

INITIALIZING OUTPUT DIRECTORIES Stage 0 Time Elapsed: 00:00:00 DONE Starting cleanup STAGE [1] "Orig Rows: 153586" [1] "Minus Duplicates: 153586" [1] "Minus Selfloops: 153586" [1] "Minus Parallel Edges: 153586" Stage 1 Time Elapsed: 00:00:01 DONE Starting clustering STAGE Currently on resolution 0.5, running 2 iterations Stage 2 Time Elapsed: 00:00:08 DONE Starting stats STAGE Currently on param set 0 Traceback (most recent call last): File "/shared/gc/myvenv/cm_pipeline/scripts/stats.py", line 2, in import networkit as nk File "/shared/gc/myvenv/lib/python3.10/site-packages/networkit/init.py", line 57, in from . import graph File "networkit/graph.pyx", line 1, in init networkit.graph ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject Stage 3 Time Elapsed: 00:00:01 DONE Starting filtering STAGE Currently on parameter set 0 [1] "OK 3 params supplied" [1] "OK 3 params supplied" Stage 4 Time Elapsed: 00:00:02 DONE Starting connectivity_modifier STAGE Currently on resolution 0.5, running 2 iterations Traceback (most recent call last): File "/software/python-3.10.0/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/software/python-3.10.0/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/shared/gc/myvenv/lib/python3.10/site-packages/hm01/cm.py", line 16, in from hm01.to_universal import cm2universal File "/shared/gc/myvenv/lib/python3.10/site-packages/hm01/to_universal.py", line 11, in from hm01.graph import Graph, IntangibleSubgraph File "/shared/gc/myvenv/lib/python3.10/site-packages/hm01/graph.py", line 10, in import networkit as nk File "/shared/gc/myvenv/lib/python3.10/site-packages/networkit/init.py", line 57, in from . import graph File "networkit/graph.pyx", line 1, in init networkit.graph ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject /shared/gc/myvenv/cm_pipeline/samples//brians_test-20240719-16:57:48/leiden_res0.5_i2/S5_brianjy_leiden.connectivity_modifier_res0.5_i2.tsv failed to generate

begin pipeline.json $ cat pipeline.json { "title": "brians_test", "name": "brianjy", "input_file": "/shared/brianjy2/myvenv3/dianes_dois_network.tsv", "output_dir": "samples/", "algorithm": "leiden", "params": [ { "res": 0.5, "i": 2 } ], "stages": [ { "name": "cleanup" }, { "name": "clustering", "parallel_limit": 2 }, { "name": "stats", "parallel_limit": 2 }, { "name": "filtering", "scripts": [ "./scripts/subset_graph_nonetworkit_treestar.R", "./scripts/make_cm_ready.R" ] }, { "name": "connectivity_modifier", "memprof": false, "threshold": "1log10", "nprocs": 4, "quiet": true }, { "name": "filtering", "scripts": [ "./scripts/post_cm_filter.R" ] }, { "name": "stats", "parallel_limit": 2 } ] }

chackoge commented 1 month ago

After consulting @min setting numpy to 1.26.0 worked in a Python 3.10 venv. The new requirements.txt is

Cython
attrs==22.2
click==8.1
colorama==0.4
coloredlogs==15.0
exceptiongroup==1.1
graphviz==0.20
HeapDict==1.0
humanfriendly==10.0
igraph==0.10
iniconfig==2.0
jsonpickle==2.2
leidenalg
numpy==1.26.0
packaging==23.0
pandas
pip
pluggy==1.0
pytest==7.2
pytz==2023.2
scipy
setuptools==50.3
six==1.16
structlog==22.3
tomli==2.0
treeswift==1.1
typer==0.6
typing-extensions==4.5
networkx==3.1
psutil==5.9
infomap==2.7
networkit
git+https://github.com/vikramr2/python-mincut
.