vanheeringen-lab / ANANSE

Prediction of key transcription factors in cell fate determination using enhancer networks. See full ANANSE documentation for detailed installation instructions and usage examples.
http://anansepy.readthedocs.io
MIT License
77 stars 16 forks source link

multiprocessing.pool.MaybeEncodingError: recursionError ananse influence #123

Closed Arts-of-coding closed 1 year ago

Arts-of-coding commented 3 years ago

Hi,

When running ananse influence (ananse newest version) I get a recursion error. I initially thought that I got the error, because of some of the new changes that were made in the influence.py file could perhaps account for it. That is not the case (see below). I still get the error (at the same stages) when I manually adjusted the recursion limit (form 3000 to 10000 or 100000). Running it on one or multiple cores does not matter. Additionally, all networks files that I use are generated with the new binding.h5 files.

As can be seen, the differential network file is generate perfectly, but it seems that with generating the influence.txt file the error occurs. My question is: is there a version of influence.py that uses a more iterative approach instead of a recursive one that I can use or are you are perhaps developing the influence command in a non-recursive dependent way?

Error log influence.py version 0.3.0:

2021-07-27 14:30:47 | INFO | Reading network(s), using top 100000 edges.
2021-07-27 14:31:23 | INFO | Differential network has 96883 edges.
2021-07-27 14:31:24 | INFO | Save differential network
2021-07-27 14:31:24 | INFO | Run target score
2021-07-27 14:31:24 | INFO | Differential network contains 105 transcription factors.

  0%|          | 0/53 [00:00<?, ?it/s]
  2%|▏         | 1/53 [00:26<23:17, 26.88s/it]
  4%|▍         | 2/53 [00:31<11:29, 13.52s/it]
  6%|▌         | 3/53 [00:59<16:52, 20.25s/it]
  8%|▊         | 4/53 [01:01<10:50, 13.27s/it]
  9%|▉         | 5/53 [01:30<14:58, 18.72s/it]
 11%|█▏        | 6/53 [01:54<16:10, 20.65s/it]
 13%|█▎        | 7/53 [01:55<10:46, 14.05s/it]
 15%|█▌        | 8/53 [02:01<08:39, 11.55s/it]
 17%|█▋        | 9/53 [02:47<16:27, 22.45s/it]
 19%|█▉        | 10/53 [03:31<20:46, 28.99s/it]
 21%|██        | 11/53 [03:53<18:48, 26.87s/it]
 23%|██▎       | 12/53 [04:27<19:51, 29.05s/it]
 25%|██▍       | 13/53 [04:45<17:02, 25.57s/it]
 26%|██▋       | 14/53 [05:21<18:43, 28.81s/it]
 28%|██▊       | 15/53 [05:51<18:33, 29.31s/it]
 28%|██▊       | 15/53 [05:51<14:51, 23.46s/it]
Traceback (most recent call last):
  File "/vol/mbconda/julian/envs/ananse030/bin/ananse", line 369, in <module>
    args.func(args)
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/commands/influence.py", line 23, in influence
    a.run_influence(args.plot)  # -p
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 403, in run_influence
    influence_file = self.run_target_score() #self.run_target_score()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 332, in run_target_score
    ) = j.get()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x14572e2428b0>'. Reason: 'RecursionError('maximum recursion depth exceeded while pickling an object')'

Error log influence.py version 0.2.2:

2021-07-27 14:51:27 | INFO | Reading network(s)
2021-07-27 14:52:05 | INFO | Run target score

  0%|          | 0/53 [00:00<?, ?it/s]
  2%|▏         | 1/53 [00:17<15:12, 17.54s/it]
  4%|▍         | 2/53 [00:21<08:02,  9.47s/it]
  6%|▌         | 3/53 [00:42<12:20, 14.80s/it]
  8%|▊         | 4/53 [00:44<08:04,  9.89s/it]
  9%|▉         | 5/53 [01:04<10:48, 13.51s/it]
 11%|█▏        | 6/53 [01:22<11:44, 14.98s/it]
 13%|█▎        | 7/53 [01:23<07:51, 10.25s/it]
 15%|█▌        | 8/53 [01:28<06:34,  8.77s/it]
 17%|█▋        | 9/53 [02:08<13:28, 18.38s/it]
 19%|█▉        | 10/53 [02:45<17:25, 24.32s/it]
 21%|██        | 11/53 [03:07<16:28, 23.54s/it]
 23%|██▎       | 12/53 [03:41<18:14, 26.68s/it]
 25%|██▍       | 13/53 [03:58<15:48, 23.72s/it]
 26%|██▋       | 14/53 [04:30<17:06, 26.32s/it]
 28%|██▊       | 15/53 [05:00<17:22, 27.43s/it]
 28%|██▊       | 15/53 [05:01<12:42, 20.07s/it]
Traceback (most recent call last):
  File "/vol/mbconda/julian/envs/ananse030/bin/ananse", line 369, in <module>
    args.func(args)
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/commands/influence.py", line 23, in influence
    a.run_influence(args.plot)  # -p
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 400, in run_influence
    influence_file = self.run_target_score()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 333, in run_target_score
    ) = j.get()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x14ecd9a1e8e0>'. Reason: 'RecursionError('maximum recursion depth exceeded while pickling an object')'

Error log influence.py version 0.3.0 in bash:

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ nice -15 ananse influence -s /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/ESC/full_network_includeprom.txt -t /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CB/full_network_includeprom.txt -d /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/R/scRNA-seq/20210710CBESCpseudobulkpadj.tsv --plot -o /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CB/ESCCB_influence_100000_bash/influence.txt -n 1
2021-07-27 15:09:54 | INFO | Reading network(s), using top 100000 edges.
2021-07-27 15:10:33 | INFO | Differential network has 97331 edges.
2021-07-27 15:10:34 | INFO | Save differential network
2021-07-27 15:10:34 | INFO | Run target score
2021-07-27 15:10:34 | INFO | Differential network contains 105 transcription factors.
  2%|█▉                                                                                                         | 1/56 [00:36<33:54, 36.98s/it]
Traceback (most recent call last):
  File "/vol/mbconda/julian/envs/ananse030/bin/ananse", line 369, in <module>
    args.func(args)
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/commands/influence.py", line 23, in influence
    a.run_influence(args.plot)  # -p
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 403, in run_influence
    influence_file = self.run_target_score() #self.run_target_score()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 332, in run_target_score
    ) = j.get()
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x149ea34db340>'. Reason: 'RecursionError('maximum recursion depth exceeded while pickling an object')'

Heads of all data files used in the bash command:

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ head /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/R/scRNA-seq/20210710CBESCpseudobulkpadj.tsv
resid   log2FoldChange  padj                                                                                                                   
MALAT1  2.01917297250359        1.21609781933084e-14
FTH1    0.554832523039348       5.97221910989959e-05
S100A6  7.09098335732385        2.09001840168269e-156
RPLP1   -0.346313595167683      0.0083751703817102
TMSB4X  -0.0862913963711553     0.650785895986474
RPL41   -1.40160113712021       2.94187652176886e-25
S100A4  6.29451893369658        2.47465007076323e-268
TPT1    -0.0946038325025456     0.447382403749437
KRT19   2.00779359375441        0.00285817554848267

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ head /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CB/full_network_includeprom.txt
tf_target       prob
ALX3_CAPN6      0.14832412301906545
ALX3_HSP90AB1   0.5940026869396354
ALX3_DACH1      0.1734530947215131
ALX3_CAPN7      0.35942573275648526
ALX3_DACH2      0.1300919938876281
ALX3_CAPN8      0.2585356360898728
ALX3_NEB        0.16115442632367505
ALX3_AVL9       0.37296676319107386
ALX3_CAPN9      0.24746662464022257

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ head /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/ESC/full_network_includeprom.txt
tf_target       prob
ARX_TSPAN6      0.38058075969576177
ARX_TNMD        0.2688976690451759
ARX_DPM1        0.5443037431263824
ARX_SCYL3       0.41433977862298793
ARX_FGR 0.4209858533789259
ARX_CFH 0.2854447354453774
ARX_FUCA2       0.5429839484999737
ARX_GCLC        0.533291590126107
ARX_NFYA        0.5563305479510423

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ head /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CB/ESCCB_influence_100000_bash/influence_diffnetwork.txt
EGR1    HSPB1   0.009965766588597491
EGR1    S100A6  0.03151353529809664
EGR1    S100A4  0.037657224931578415
EGR1    MT2A    0.009054553944722432
EGR1    JUNB    0.9804249902220884
EGR1    RPS2    0.0009365556142676423
EGR1    ADIRF   0.979961144558964
EGR1    S100A14 0.9798296015685016
EGR1    SFN     0.979763668884937
EGR1    FTL     0.003910994819803726

I'm hoping it can be fixed soon, many thanks in advance!

Kind regards,

Julian

simonvh commented 3 years ago

Can you install the latest version and run with -n 1? This should fix it:

pip install git+https://github.com/vanheeringen-lab/ANANSE.git@develop
Arts-of-coding commented 3 years ago

Dear @simonvh,

It indeed seems to work now, I do not get the recursion error anymore. Many thanks!

Arts-of-coding commented 3 years ago

Dear @simonvh,

The develop version works very well for many of my samples. However, for a very limited amount of transcription factors I get an Valueerror. The issue seems to be indeed SOX7, because when I delete SOX7 from the DEG file, the program runs just fine. Additionally, for a limited amount of factors it can not calculate the p-value, despite the fact that it is present in the DEG file (see HES1).

How can I solve these two problems without omitting the factors?

Console command

(ananse030) julian@cn45:/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/jupyter_notebooks$ nice -15 ananse influence -s /ceph/rimlsfnwi/data/moldevbio/zhou/jsmits/Ananse_test_data/analysis/peakpred_V4/only_ATAC/ESCs/full_network_include-promoter.txt -t /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CSB/full_network_includeprom.txt -d /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/R/scRNA-seq/20210710CSBESCpseudobulkpadj.tsv --plot -o /ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CSB/ESCCSB_influence_100000_V3/influence.txt -n 1
2021-07-30 13:36:31 | INFO | Reading network(s), using top 100000 edges.
2021-07-30 13:38:23 | INFO | Differential network has 91045 edges.
2021-07-30 13:38:24 | INFO | Saving differential network.
2021-07-30 13:38:24 | INFO | Calculating target scores.
2021-07-30 13:38:24 | INFO | Differential network contains 115 transcription factors.
2021-07-30 13:38:24 | INFO | Out of these, 68 are differentially expressed.
  0%|                                                                                                                       | 0/68 [00:00<?, ?it/s]2021-07-30 13:38:24 | WARNING | Could not calculate p-val (target vs non-target fold-change) for NR2F6.
  1%|█▋                                                                                                             | 1/68 [00:00<00:09,  6.80it/s]2021-07-30 13:38:24 | WARNING | Could not calculate p-val (target vs non-target fold-change) for SREBF1.
 12%|█████████████                                                                                                  | 8/68 [02:47<23:16, 23.28s/it]2021-07-30 13:41:12 | WARNING | Could not calculate p-val (target vs non-target fold-change) for CUX1.
 25%|███████████████████████████▌                                                                                  | 17/68 [04:18<08:48, 10.36s/it]2021-07-30 13:42:43 | WARNING | Could not calculate p-val (target vs non-target fold-change) for CREB3L2.
 31%|█████████████████████████████████▉                                                                            | 21/68 [05:09<12:17, 15.69s/it]2021-07-30 13:43:33 | WARNING | Could not calculate p-val (target vs non-target fold-change) for HES1.
 41%|█████████████████████████████████████████████▎                                                                | 28/68 [11:25<28:41, 43.04s/it]2021-07-30 13:49:49 | WARNING | Could not calculate p-val (target vs non-target fold-change) for FOXC1.
 74%|████████████████████████████████████████████████████████████████████████████████▉                             | 50/68 [21:39<09:32, 31.78s/it]2021-07-30 14:00:04 | WARNING | Could not calculate p-val (target vs non-target fold-change) for ZNF628.
 81%|████████████████████████████████████████████████████████████████████████████████████████▉                     | 55/68 [23:52<05:38, 26.05s/it]
2021-07-30 14:02:16 | ERROR | An error has been caught in function '<module>', process 'MainProcess' (9295), thread 'MainThread' (22877091034944):
Traceback (most recent call last):

> File "/vol/mbconda/julian/envs/ananse030/bin/ananse", line 370, in <module>
    args.func(args)
    │    │    └ Namespace(Gaf='/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CSB/full_network_includeprom.txt', expres...
    │    └ <function influence at 0x14ce6133b1f0>
    └ Namespace(Gaf='/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CSB/full_network_includeprom.txt', expres...
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/commands/influence.py", line 28, in influence
    a.run_influence(args.plot)  # -p
    │ │             │    └ True
    │ │             └ Namespace(Gaf='/ceph/rimlsfnwi/data/moldevbio/zhou/jarts/data/lako2021/ANANSE/outs3/CSB/full_network_includeprom.txt', expres...
    │ └ <function Influence.run_influence at 0x14ce6133b4c0>
    └ <ananse.influence.Influence object at 0x14ce608c8a90>
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 440, in run_influence
    influence_file = self.run_target_score()
                     │    └ <function Influence.run_target_score at 0x14ce6133b3a0>
                     └ <ananse.influence.Influence object at 0x14ce608c8a90>
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 353, in run_target_score
    targetScore(tf, self.G, self.expression_change, max_degree)
    │           │   │    │  │    │                  └ 3
    │           │   │    │  │    └ {'A1BG-AS1': Expression(score=5.95559564232664, absfc=5.95559564232664, realfc=-5.95559564232664), 'A2M-AS1': Expression(scor...
    │           │   │    │  └ <ananse.influence.Influence object at 0x14ce608c8a90>
    │           │   │    └ <networkx.classes.digraph.DiGraph object at 0x14ce79c546a0>
    │           │   └ <ananse.influence.Influence object at 0x14ce608c8a90>
    │           └ 'SOX7'
    └ <function targetScore at 0x14ce6133b040>
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/ananse/influence.py", line 181, in targetScore
    pval = mannwhitneyu(target_fc, non_target_fc)[1]
           │            │          └ [1.39823554111491, 4.1683099187238, 1.5175785156745, 0.992809527831118, 0.748845700863607, 2.57403113993767, 0.12003824843831...
           │            └ []
           └ <function mannwhitneyu at 0x14ce738894c0>
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/scipy/stats/_mannwhitneyu.py", line 391, in mannwhitneyu
    _mwu_input_validation(x, y, use_continuity, alternative, axis, method))
    │                     │  │  │               │            │     └ 'auto'
    │                     │  │  │               │            └ 0
    │                     │  │  │               └ 'two-sided'
    │                     │  │  └ True
    │                     │  └ [1.39823554111491, 4.1683099187238, 1.5175785156745, 0.992809527831118, 0.748845700863607, 2.57403113993767, 0.12003824843831...
    │                     └ []
    └ <function _mwu_input_validation at 0x14ce73881ee0>
  File "/vol/mbconda/julian/envs/ananse030/lib/python3.9/site-packages/scipy/stats/_mannwhitneyu.py", line 135, in _mwu_input_validation
    raise ValueError('`x` and `y` must be of nonzero size.')

ValueError: `x` and `y` must be of nonzero size.

Within the network files the TFs are found both as TF1 and TF2 specified below. Additionally the values within the DEG file look fine to me

tf_target   prob
TF1_TF2 value
resid   log2FoldChange  padj
SOX7    2.09843119005866    1.59385870438176e-09
HES1    4.36895985967573    1.47232140608283e-83

Many thanks in advance again!

Arts-of-coding commented 3 years ago

I fixed it myself. If other people are having trouble, here is the solution:

Inside the influence.py file in your conda environment (~/lib/python3.9/site-packages/ananse/) change:

    try:
        pval = mannwhitneyu(target_fc, non_target_fc)[1]
    except RecursionError:

to

    try:
        pval = mannwhitneyu(target_fc, non_target_fc)[1]
    except (RecursionError, ValueError) as e:

This way it will skip the ValueError as well.

simonvh commented 3 years ago

Re-open as a reminder to fix this in the new release.