ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph
MIT License
76 stars 10 forks source link

The Dip-C data processing keeps encountering errors. #52

Closed a50044758 closed 2 months ago

a50044758 commented 2 months ago

I encountered the following error when processing Dip-C data (10 samples) using Higashi: (higashi) linda@dell $ python /home/linda/tools/Higashi/higashi/Process.py -c config.JSON generating start/end dict for chromosome extracting from filelist.txt 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:11<00:00, 1.18s/it] generating contact maps for baseline data loaded 790819 False creating matrices tasks: 100%|██████████████████████████████████████████████████████| 23/23 [00:00<00:00, 242.04it/s] Traceback (most recent call last): File "/home/linda/tools/Higashi/higashi/Process.py", line 1229, in create_matrix(config) File "/home/linda/tools/Higashi/higashi/Process.py", line 742, in create_matrix cell_adj_all = [vstack(new_cell_adj_all1).tocsr(), vstack(new_cell_adj_all2).tocsr()] File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_construct.py", line 781, in vstack return _block([[b] for b in blocks], format, dtype, return_spmatrix=True) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_construct.py", line 938, in _block A = coo_array(blocks[i,j]) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_coo.py", line 84, in init self._shape = check_shape(M.shape, allow_1d=is_array) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_sputils.py", line 317, in check_shape raise TypeError("function missing 1 required positional argument: " TypeError: function missing 1 required positional argument: 'shape'

However, I can run the test data provided on the website without issues: (higashi) linda@dell $ python /home/linda/tools/Higashi/higashi/Process.py -c config_ramani.JSON generating start/end dict for chromosome extracting from data.txt 100%|█████████████████████████████████████████████████████████████████████████████████████████| 15891786/15891786 [00: generating contact maps for baseline data loaded 4110311 False creating matrices tasks: 100%|████████████████████████████████████████████████████████████████████████████████| 23/23 total_feats_size 403 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 I am unsure how to resolve this issue. I have uploaded my CONFIG file and filelist.txt file.

config.JSON filelist.txt

Additionally, my pairs file strictly follows the "higashi_v2" format: (higashi) linda@dell $ cat higashi_dedup_SRR7226668.mapped.pairs | head -5 chr1 10377 chr1 10581 chr1 10459 chr1 10622 chr1 54678 chr1 54857 chr1 532385 chr1 532590 chr1 564372 chr1 564501I encountered the following error when processing Dip-C data (10 samples) using Higashi: (higashi) linda@dell $ python /home/linda/tools/Higashi/higashi/Process.py -c config.JSON generating start/end dict for chromosome extracting from filelist.txt 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:11<00:00, 1.18s/it] generating contact maps for baseline data loaded 790819 False creating matrices tasks: 100%|██████████████████████████████████████████████████████| 23/23 [00:00<00:00, 242.04it/s] Traceback (most recent call last): File "/home/linda/tools/Higashi/higashi/Process.py", line 1229, in create_matrix(config) File "/home/linda/tools/Higashi/higashi/Process.py", line 742, in create_matrix cell_adj_all = [vstack(new_cell_adj_all1).tocsr(), vstack(new_cell_adj_all2).tocsr()] File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_construct.py", line 781, in vstack return _block([[b] for b in blocks], format, dtype, return_spmatrix=True) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_construct.py", line 938, in _block A = coo_array(blocks[i,j]) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_coo.py", line 84, in init self._shape = check_shape(M.shape, allow_1d=is_array) File "/home/linda/miniconda3/envs/higashi/lib/python3.9/site-packages/scipy/sparse/_sputils.py", line 317, in check_shape raise TypeError("function missing 1 required positional argument: " TypeError: function missing 1 required positional argument: 'shape'

However, I can run the test data provided on the website without issues: (higashi) linda@dell $ python /home/linda/tools/Higashi/higashi/Process.py -c config_ramani.JSON generating start/end dict for chromosome extracting from data.txt 100%|█████████████████████████████████████████████████████████████████████████████████████████| 15891786/15891786 [00: generating contact maps for baseline data loaded 4110311 False creating matrices tasks: 100%|████████████████████████████████████████████████████████████████████████████████| 23/23 total_feats_size 403 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 I am unsure how to resolve this issue. I have uploaded my CONFIG file and filelist.txt file.

config.JSON filelist.txt

Additionally, my pairs file strictly follows the "higashi_v2" format: (higashi) linda@dell $ cat higashi_dedup_SRR7226668.mapped.pairs | head -5 chr1 10377 chr1 10581 chr1 10459 chr1 10622 chr1 54678 chr1 54857 chr1 532385 chr1 532590

a50044758 commented 2 months ago

另外,我使用的higashi是 1333de2 这个版本。

ruochiz commented 2 months ago

Hum. It's the first time I see this error. Could you perhaps share some of those higashi_dedup_xxx.mapped.pairs (my email: zhangruo@broadinstitute.org). And I can take a look on reproducing the error

ruochiz commented 2 months ago

Hi, based on the file you shared I think the error comes from:

  1. you create a "batch id" in your label_info.pickle, but that contains a vector of length 0 (in practice, it should be the same length as the number of cells you have)
  2. you use "batch id" in the config.JSON, but this dataset doesn't contain batches.

As a results, the processing script try to process each batch, but because there is a batch id vector of 0, it processes 0 cells, resulting in this error.

To remove the error you should delete the batch id key in both the label_info.pickle file and the config.JSON.

Hope this resolves the error.