kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
210 stars 23 forks source link

How does one import multiple fragments.tsv.gz files from different samples? #314

Open yojetsharma opened 1 month ago

kaizhang commented 1 month ago

Does this answer your question: https://kzhang.org/SnapATAC2/tutorials/integration.html?

yojetsharma commented 1 month ago

Does this answer your question: https://kzhang.org/SnapATAC2/tutorials/integration.html?

I had tried that this is the output:

>>> files = [('d149', PosixPath('/home/user/Desktop/test/d149_fragments.tsv.gz')),('ls002', PosixPath('/home/user/Desktop/test/ls002_fragments.tsv.gz'))]
>>> files
[('d149', PosixPath('/home/user/Desktop/test/d149_fragments.tsv.gz')), ('ls002', PosixPath('/home/user/Desktop/test/ls002_fragments.tsv.gz'))]
>>> adatas = snap.pp.import_data(
...     [fl for _, fl in files],
...     file=[name + '.h5ad' for name, _ in files],
...     chrom_sizes=snap.genome.hg38,
...     min_num_fragments=1000,
... )
  0%|                                                                                                                                                                                     | 0/2 [00:00<?, ?it/s]⠁ Processed 1 barcodes in 0s (562.1857/s) ...                                                                                                                                                                   ⠄ Processed 1,834 barcodes in 0s (99,959.6625/s) ...                                                                                                                                                            thread '<unnamed>' panicked at /project/snapatac2-core/src/preprocessing/count_data/import.rs:117:17:
Please sort fragment file by barcodes
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at /project/snapatac2-core/src/preprocessing/count_data/import.rs:117:17:
Please sort fragment file by barcodes
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Process SpawnPoolWorker-1:
Process SpawnPoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/snapatac2/_utils.py", line 31, in _func
    result = func((x[0], adata))
             ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/snapatac2/_utils.py", line 31, in _func
    result = func((x[0], adata))
             ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/snapatac2/preprocessing/_basic.py", line 296, in <lambda>
    lambda x: internal.import_fragments(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/snapatac2/preprocessing/_basic.py", line 296, in <lambda>
    lambda x: internal.import_fragments(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: Please sort fragment file by barcodes
pyo3_runtime.PanicException: Please sort fragment file by barcodes
^CProcess SpawnPoolWorker-8:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-9:
Process SpawnPoolWorker-6:
Process SpawnPoolWorker-4:
Process SpawnPoolWorker-5:
Process SpawnPoolWorker-10:
Process SpawnPoolWorker-7:
  0%|                                                                                                                                                                                     | 0/2 [00:09<?, ?it/s]
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/queues.py", line 384, in get
    with self._rlock:
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/queues.py", line 384, in get
    with self._rlock:
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/queues.py", line 384, in get
    with self._rlock:
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Since, it was showing sort by barcode on the top, I modified by adding "sorted_by_barcode=False," and it worked the second time. Is this okay?

>>> adatas = snap.pp.import_data(
...     [fl for _, fl in files],
...     file=[name + '.h5ad' for name, _ in files],
...     chrom_sizes=snap.genome.hg38,
...     min_num_fragments=1000,sorted_by_barcode=False,
... )
  0%|                                                                                                                                                                                     | 0/2 [00:00<?, ?it/s]⠒ Processed 65 barcodes in 0s (314.9573/s) ...                                                                                                                                                                  100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:24<00:00, 132.15s/it]
>>> adatas
[AnnData object with n_obs x n_vars = 18593 x 0 backed at 'd149.h5ad'
    obs: 'n_fragment', 'frac_dup', 'frac_mito'
    uns: 'reference_sequences'
    obsm: 'fragment_paired', AnnData object with n_obs x n_vars = 33205 x 0 backed at 'ls002.h5ad'
    obs: 'n_fragment', 'frac_dup', 'frac_mito'
    uns: 'reference_sequences'
    obsm: 'fragment_paired']
>>> snap.metrics.tsse(adatas, snap.genome.hg38)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.15s/it]