kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
53 stars 17 forks source link

AssertionError: daemonic processes are not allowed to have children #45

Closed pixuenan closed 3 years ago

pixuenan commented 3 years ago

While processing bulk beta chain data by running

S, fragments = compute_pw_sparse_out_of_memory2(        tr = tr,
        row_size      = 50,
        pm_processes  = 1,
        pm_pbar       = True,
        max_distance  = 50,
        reassemble    = True,
        cleanup       = False,
        assign        = True)

encounter error

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/parmap/parmap.py", line 117, in _func_star_many
    **func_items_args[3])
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/tcrdist/memory.py", line 108, in gen_sparse_rw_on_fragment2
    tr.compute_rect_distances(df = tr.clone_df.iloc[ind,], df2 = tr.clone_df, store = False)
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/tcrdist/repertoire.py", line 341, in compute_rect_distances
    self._rect_distances(pw_dist_func=_pws, df = df, df2 = df2, store = store)
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/tcrdist/repertoire.py", line 371, in _rect_distances
    **kwargs)
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/tcrdist/rep_funcs.py", line 117, in _pws
    pw_mat = _pw(seqs1 = df[k].values, seqs2 = df2[k].values, metric = metrics[k], ncpus = cpu, uniqify= uniquify, **kargs[k])
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/tcrdist/rep_funcs.py", line 140, in _pw
    **kwargs)
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/site-packages/pwseqdist/pairwise.py", line 148, in apply_pairwise_rect
    with multiprocessing.Pool(ncpus) as pool:
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 174, in __init__
    self._repopulate_pool()
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
    w.start()
  File "/NAS/wg_pxn/tool/anaconda3/envs/py36/lib/python3.6/multiprocessing/process.py", line 103, in start
    'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
kmayerb commented 3 years ago

Are you running this in IPyhton or as a python script? are you using a Windows, Linux, OSX machine?

Try running this a script in iPython to see if the test will work in your environment.

Tests: https://github.com/kmayerb/tcrdist3/blob/679f351fa3e46f298317aa14154629fa1d5649b0/tcrdist/tests/test_out_of_memory_features.py

  import pandas as pd
  import numpy as np
  from tcrdist.repertoire import TCRrep
  from tcrdist.rep_funcs import  compute_pw_sparse_out_of_memory2
  from tcrdist.rep_funcs import  compute_n_tally_out_of_memory2
  from hierdiff.association_testing import cluster_association_test

  df = pd.read_csv("dash.csv")
  tr = TCRrep(cell_df = df.sample(100, random_state = 1), 
              organism = 'mouse', 
              chains = ['alpha','beta'], 
              db_file = 'alphabeta_gammadelta_db.tsv', 
              compute_distances = True,
              store_all_cdr = False)

  check_beta = tr.pw_beta.copy(); check_beta[check_beta == 0] = 1
  check_alpha = tr.pw_alpha.copy(); check_alpha[check_alpha == 0] = 1
  check_alpha_beta = check_beta + check_alpha

  S, fragments = compute_pw_sparse_out_of_memory2(    tr = tr,
                                                      row_size      = 50,
                                                      pm_processes  = 1,
                                                      pm_pbar       = True,
                                                      max_distance  = 1000,
                                                      reassemble    = True,
                                                      cleanup       = False,
                                                      assign        = True)

Screen Shot 2021-02-02 at 12 34 13 PM

Also depending on your objective the function, compute_sparse_rect_distances may help you. I will add more docs on this shortly, but here is an example. The result is a sparse matrix, with distances greater than radius dropped.

import numpy as np
import pandas as pd
from tcrdist.repertoire import TCRrep

df = pd.read_csv("dash.csv").query('epitope == "PA"')
tr = TCRrep(cell_df = df,               #(2)
            organism = 'mouse', 
            chains = ['beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv',
            compute_distances = False)
    # When setting the radius to 50, the sparse matrix 
    # will convert any value > 50 to 0. True zeros are 
    # repressented as -1.
radius = 50
tr.cpus = 4
    # Notice that we called .compute_sparse_rect_distances instead of .compute_distances
tr.compute_sparse_rect_distances(df = tr.clone_df, radius = radius. chunk_size = 100)
pixuenan commented 3 years ago

Hi Koshlan,

Thanks for the quick reply.

I am aimed to find (Quasi)Public Clones associated with variables of interest.

I am running the script directly in python on a linux server with CentOS7, where running in IPython is impossible because ssh forwarding is forbidden by the administrator.

I just find out by changing max_distance = 50 to max_distance = 1000 in the compute_pw_sparse_out_of_memory2, it works.