shenwanxiang / bidd-aggmap

Jigsaw-like AggMap: A Robust and Explainable Multi-Channel Omics Deep Learning Tool
https://bidd-aggmap.readthedocs.io/en/latest/
GNU General Public License v3.0
34 stars 5 forks source link

NameError: name 'data' is not defined for Windows aggmap #20

Open shenwanxiang opened 1 year ago

shenwanxiang commented 1 year ago

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 41, in _fuc return _calculate(i1, i2) File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 23, in _calculate x1 = data[:, i1] NameError: name 'data' is not defined """

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last) Cell In[1], line 11 8 dfy = pd.get_dummies(pd.Series(data.target)) 10 # AggMap object definition, fitting, and saving ---> 11 mp = AggMap(dfx, metric = 'correlation') 12 mp.fit(cluster_channels=5, emb_method = 'umap', verbose=0) 13 mp.save('agg.mp')

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\map.py:176, in AggMap.init(self, dfx, metric, by_scipy, n_cpus, info_distance) 174 self.info_distance = D.clip(0, np.inf) 175 else: --> 176 D = calculator.pairwise_distance(dfx.values, n_cpus=n_cpus, method=metric) 177 D = np.nan_tonum(D,copy=False) 178 D = squareform(D)

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py:67, in pairwise_distance(npydata, n_cpus, method) 65 N = data.shape[1] 66 lst = list(_yield_combinations(N)) ---> 67 res = MultiProcessUnorderedBarRun(_fuc, lst, n_cpus=n_cpus) 68 dist_matrix = np.zeros(shape = (N,N)) 69 for x,y,v in tqdm(res,ascii=True):

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\multiproc.py:111, in MultiProcessUnorderedBarRun(func, deal_list, n_cpus) 109 res_list = [] 110 with pbar(total = len(deal_list), ascii=True) as pb: --> 111 for res in p.imap_unordered(func, deal_list): 112 pb.update(1) 113 res_list.append(res)

File ~\anaconda3\envs\aggmap\lib\multiprocessing\pool.py:868, in IMapIterator.next(self, timeout) 866 if success: 867 return value --> 868 raise value

NameError: name 'data' is not defined

shenwanxiang commented 1 year ago

You shouldn't expect the values of global variables that you set in the parent process to be automatically propagated to the child processes.

Your code happens to work on Unix-like platforms because on those platforms multiprocessing uses fork(). This means that every child processes gets a copy of the parent process's address space, including all global variables.

This isn't the case on Windows; every variable from the parent process that needs to be accessed by the child has to be explicitly passed down or placed in shared memory.

Once you do this, your code will work on both Unix and Windows.

Ref: https://stackoverflow.com/questions/6596617/python-multiprocess-diff-between-windows-and-linux