Closed pixuenan closed 3 years ago
Are you running this in IPyhton or as a python script? are you using a Windows, Linux, OSX machine?
Try running this a script in iPython to see if the test will work in your environment.
import pandas as pd
import numpy as np
from tcrdist.repertoire import TCRrep
from tcrdist.rep_funcs import compute_pw_sparse_out_of_memory2
from tcrdist.rep_funcs import compute_n_tally_out_of_memory2
from hierdiff.association_testing import cluster_association_test
df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df.sample(100, random_state = 1),
organism = 'mouse',
chains = ['alpha','beta'],
db_file = 'alphabeta_gammadelta_db.tsv',
compute_distances = True,
store_all_cdr = False)
check_beta = tr.pw_beta.copy(); check_beta[check_beta == 0] = 1
check_alpha = tr.pw_alpha.copy(); check_alpha[check_alpha == 0] = 1
check_alpha_beta = check_beta + check_alpha
S, fragments = compute_pw_sparse_out_of_memory2( tr = tr,
row_size = 50,
pm_processes = 1,
pm_pbar = True,
max_distance = 1000,
reassemble = True,
cleanup = False,
assign = True)
Also depending on your objective the function, compute_sparse_rect_distances may help you. I will add more docs on this shortly, but here is an example. The result is a sparse matrix, with distances greater than radius dropped.
import numpy as np
import pandas as pd
from tcrdist.repertoire import TCRrep
df = pd.read_csv("dash.csv").query('epitope == "PA"')
tr = TCRrep(cell_df = df, #(2)
organism = 'mouse',
chains = ['beta'],
db_file = 'alphabeta_gammadelta_db.tsv',
compute_distances = False)
# When setting the radius to 50, the sparse matrix
# will convert any value > 50 to 0. True zeros are
# repressented as -1.
radius = 50
tr.cpus = 4
# Notice that we called .compute_sparse_rect_distances instead of .compute_distances
tr.compute_sparse_rect_distances(df = tr.clone_df, radius = radius. chunk_size = 100)
Hi Koshlan,
Thanks for the quick reply.
I am aimed to find (Quasi)Public Clones associated with variables of interest.
I am running the script directly in python on a linux server with CentOS7, where running in IPython is impossible because ssh forwarding is forbidden by the administrator.
I just find out by changing max_distance = 50
to max_distance = 1000
in the compute_pw_sparse_out_of_memory2
, it works.
While processing bulk beta chain data by running
encounter error