pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
147 stars 23 forks source link

Issues with combining matrices for single cell nuclei data #61

Closed harsh-shukla closed 4 years ago

harsh-shukla commented 4 years ago

Hi We have some single cell nuclei data. I was following the google colab notebook : https://colab.research.google.com/github/pachterlab/kallistobustools/blob/master/notebooks/kb_single_nucleus.ipynb

I was successfully able to generate the index.

We had run cell ranger before so we wanted to do a comparision.Hence I provided the --cellranger flag. It did generate the spliced and unspliced matrix (in the cellranger folder). But it did not add them up to give me the final summed matrix.

Command run

kb count -i $INDEX_PATH/grch38.idx -g $INDEX_PATH/t2g.txt -c1 $INDEX_PATH/cdna_t2c.txt -c2 $INDEX_PATH/intron_t2c.txt -x 10xv3 -o kaliisto_Bustools_out -t 32 --workflow nucleus --cellranger \
 Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz \
 Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz  

log

[2020-02-27 20:04:21,204]    INFO Using index /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/grch38.idx to generate BUS file to kaliisto_Bustools_out from
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz
[2020-02-27 21:53:27,535]    INFO Sorting BUS file kaliisto_Bustools_out/output.bus to kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-27 21:56:01,947]    INFO Whitelist not provided
[2020-02-27 21:56:01,947]    INFO Copying pre-packaged 10XV3 whitelist to kaliisto_Bustools_out
[2020-02-27 21:56:03,236]    INFO Inspecting BUS file kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-27 21:58:58,426]    INFO Correcting BUS records in kaliisto_Bustools_out/tmp/output.s.bus to kaliisto_Bustools_out/tmp/output.s.c.bus with whitelist kaliisto_Bustools_out/10xv3_whitelist.txt
[2020-02-27 22:00:13,737]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/output.s.c.bus to kaliisto_Bustools_out/output.unfiltered.bus
[2020-02-27 22:01:16,924]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/spliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/intron_t2c.txt
[2020-02-27 22:04:10,304]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/spliced.bus to kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:04:24,324]    INFO Inspecting BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:06:27,978]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/spliced from BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:09:49,903]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_spliced
[2020-02-27 22:10:49,319]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/unspliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/cdna_t2c.txt
[2020-02-27 22:14:55,311]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/unspliced.bus to kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:15:37,445]    INFO Inspecting BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:18:03,936]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/unspliced from BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:22:13,910]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_unspliced

So I decided to run with parameters similar to the ones given in the notebook (adding the --h5ad flag and skipping the --cellranger flag) . This time it does start combining the matrices but throws some error while doing so.

Command run

kb count -i $INDEX_PATH/grch38.idx -g $INDEX_PATH/t2g.txt -c1 $INDEX_PATH/cdna_t2c.txt -c2 $INDEX_PATH/intron_t2c.txt -x 10xv3 -o kaliisto_Bustools_out -t 32 --workflow nucleus --h5ad \
 Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz \
 Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz

Log

/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/anndata.py:21: FutureWarning:

pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.

[2020-02-27 23:12:18,792]    INFO Using index /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/grch38.idx to generate BUS file to kaliisto_Bustools_out from
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz
[2020-02-28 00:31:28,302]    INFO Sorting BUS file kaliisto_Bustools_out/output.bus to kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-28 00:33:53,887]    INFO Whitelist not provided
[2020-02-28 00:33:53,887]    INFO Copying pre-packaged 10XV3 whitelist to kaliisto_Bustools_out
[2020-02-28 00:33:55,089]    INFO Inspecting BUS file kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-28 00:36:38,660]    INFO Correcting BUS records in kaliisto_Bustools_out/tmp/output.s.bus to kaliisto_Bustools_out/tmp/output.s.c.bus with whitelist kaliisto_Bustools_out/10xv3_whitelist.txt
[2020-02-28 00:37:54,364]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/output.s.c.bus to kaliisto_Bustools_out/output.unfiltered.bus
[2020-02-28 00:38:56,587]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/spliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/intron_t2c.txt
[2020-02-28 00:41:44,099]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/spliced.bus to kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:41:55,983]    INFO Inspecting BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:43:53,040]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/spliced from BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:47:00,535]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/unspliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/cdna_t2c.txt
[2020-02-28 00:50:56,512]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/unspliced.bus to kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:51:42,641]    INFO Inspecting BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:54:05,254]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/unspliced from BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:58:12,188]    INFO Reading matrix kaliisto_Bustools_out/counts_unfiltered/spliced.mtx
[2020-02-28 00:58:54,650]    INFO Reading matrix kaliisto_Bustools_out/counts_unfiltered/unspliced.mtx
[2020-02-28 01:00:01,170]    INFO Combining matrices
[2020-02-28 01:02:59,683]   ERROR An exception occurred
Traceback (most recent call last):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 700, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 174, in parse_count
    count_velocity(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 1285, in count_velocity
    convert_matrices(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 588, in convert_matrices
    adata = sum_anndatas(*adatas) if nucleus else overlay_anndatas(*adatas)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/utils.py", line 692, in sum_anndatas
    X=spliced_intersection.X + unspliced_intersection.X,
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/anndata.py", line 579, in X
    _subset(self._adata_ref.X, (self._oidx, self._vidx)),
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/functools.py", line 874, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/index.py", line 126, in _subset
    return a[subset_idx]
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/scipy/sparse/_index.py", line 75, in __getitem__
    return self._get_arrayXarray(row, col)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 664, in _get_arrayXarray
    csr_sample_values(M, N, self.indptr, self.indices, self.data,
ValueError: could not convert integer scalar

P.S Also during the first run (with --cellranger option) if i give --report option to generate a report it throws an error while generating the jupyter notebook. I am running the pipeline in a new conda environment and I think it has to do something with that.

The end part of the log

.
[2020-02-27 19:25:36,193]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_unspliced
[2020-02-27 19:27:39,358]    INFO Writing report Jupyter notebook at kaliisto_Bustools_out/report.ipynb and rendering it to kaliisto_Bustools_out/report.html
[2020-02-27 19:27:40,198]   ERROR An exception occurred
Traceback (most recent call last):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 700, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 174, in parse_count
    count_velocity(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 1408, in count_velocity
    report_result = render_report(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/utils.py", line 103, in inner
    return func(*args, **kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/report.py", line 296, in render_report
    execute_report(temp_path, nb_path, html_path)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/report.py", line 240, in execute_report
    ep.preprocess(nb)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 403, in preprocess
    with self.setup_preprocessor(nb, resources, km=km):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 345, in setup_preprocessor
    self.km, self.kc = self.start_new_kernel(**kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 291, in start_new_kernel
    km.start_kernel(extra_arguments=self.extra_arguments, **kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 253, in start_kernel
    kernel_cmd = self.format_kernel_cmd(extra_arguments=extra_arguments)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 177, in format_kernel_cmd
    cmd = self.kernel_spec.argv + extra_arguments
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 83, in kernel_spec
    self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/kernelspec.py", line 235, in get_kernel_spec
    raise NoSuchKernel(kernel_name)
jupyter_client.kernelspec.NoSuchKernel: No such kernel named python3

Any patch for this ??

Also I did refer to this issue(#60). This sample has around 330 million reads. But as far as I understand the sample is OK (or so what I have been told)

Best, Harsh

dylanmr commented 4 years ago

I am also running into this same issue with the h5ad flag. Any updates on this??

Best, Dylan

harsh-shukla commented 4 years ago

Hey @dylanmr

For now , I combined the matrices following this tutorial

https://github.com/BUStools/getting_started/blob/master/kallisto_bus_mouse_nuclei_tutorial.ipynb

The results look ok.

Best, Harsh

Lioscro commented 4 years ago

The ValueError: could not convert integer scalar error is a scipy bug that occurs with large matrices. We haven't been able to figure out how to fix this though. https://github.com/scipy/scipy/issues/7230

Lioscro commented 4 years ago

The jupyter_client.kernelspec.NoSuchKernel: No such kernel named python3 error should be fixed with 9378ff3898113d5e26e3512c1aff76e1797a1307.

harsh-shukla commented 4 years ago

Hi Joseph,

Thank you for the update . I am closing this issue.

Best, Harsh