kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
197 stars 20 forks source link

snap.pp.import_data hangs on 0% #302

Closed MubasherMohammed closed 2 months ago

MubasherMohammed commented 2 months ago

Hello, am importing 15 fragment files with snap.pp.import_data however during the import the progress bar hangs on 0% for long time although am using entire node. snapatac2 v '2.6.0'. here is my code chunk:

import snapatac2 as snap import numpy as np import scanpy as sc import os from tqdm import tqdm sc.settings.set_figure_params(dpi=200) snap.__version__ data_dir = "./fragments" output_dir = "./outs" os.makedirs(output_dir, exist_ok=True) fragment_files = [f'{data_dir}/{fl}' for fl in os.listdir(data_dir) if fl.endswith(".tsv.gz")] %%time outputs = [] for fl in fragment_files: name = fl.split('/' [-1].split('.tsv.gz')[0] outputs.append(f'{output_dir}/{name}.h5ad') adatas = snap.pp.import_data(fragment_files, chrom_sizes=snap.genome.hg38,sorted_by_barcode=False, file = outputs, min_num_fragments=10)

while when submitting a job in python script, I encountered this error in the log file /home/mubasher/miniconda3/envs/snap_env/lib/python3.9/site-packages/multiprocess/resource_tracker.py:99: UserWarning: resource_tracker: process died unexpectedly, relaunching. Some resources might leak. warnings.warn('resource_tracker: process died unexpectedly, ' ^M 0%| | 0/2 [00:00<?, ?it/s] I couldn't think of something to fix the issue. many thanks for support

kaizhang commented 2 months ago

There might be some errors during the processing of one of those files, which causes the "multiprocess" thread hangs indefinitely.

As a temporary solution, you can use a for loop to process the 15 files separately and let me know if there is any error during this process.

MubasherMohammed commented 2 months ago

Thanks for swift response.. a for loop worked perfectly.. if I may ask here as well. I have two snapatac2 processed h5ad files with shape: adata_1 = (2619, 6084870) adata_2 = (1068, 606219) I would like to integrate the anndata objects into a single adataset. however the adata.var are with different shapes. I wonder the best way to do this and run integration and downstream analysis. many thanks again..

kaizhang commented 2 months ago

Regenerate the cell by feature matrix using the same feature set, e.g., genome-wide bins or union of peaks called from both dataset.