pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

np.matrix is not supported when creating a report file for smartseq data #195

Closed yeroslaviz closed 1 year ago

yeroslaviz commented 1 year ago

When trying to create the report from my data I encounter the error:

TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html

The whole report of the error you can see below?

Is it possible to create a report file, when I don't have any UMIs or barcodes?

thanks


the whole command + error message:

$kb  count -i GRCm39/kallisto/GRCm39.index.idx -g GRCm39/kallisto/t2g.txt -t 12 -m 12G\
               --strand reverse --report --tmp kallisto_tmp\
               -o $kallistoFiles/$base/ -x SMARTSEQ2 --parity paired \
               manifest.kallisto.txt

[2023-03-03 17:33:27,477] WARNING [main] Using `--report` may cause `kb` to exceed maximum memory specified and crash for large count matrices.
[2023-03-03 17:33:31,656]    INFO [count] Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.
[2023-03-03 17:33:31,657]    INFO [count] Sorting BUS file Kallisto_Quant/cre_pos_9/output.bus to kallisto_tmp/output.s.bus
[2023-03-03 17:33:43,931]    INFO [count] Inspecting BUS file kallisto_tmp/output.s.bus
[2023-03-03 17:33:45,051]    INFO [count] Generating count matrix Kallisto_Quant/cre_pos_9/counts_unfiltered/cells_x_genes from BUS file kallisto_tmp/output.s.bus
[2023-03-03 17:33:48,112]    INFO [count] Writing report Jupyter notebook at /Kallisto_Quant/cre_pos_9/report.ipynb and rendering it to  Kallisto_Quant/cre_pos_9/report.html
[2023-03-03 17:34:05,304]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/main.py", line 1305, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/main.py", line 550, in parse_count
    count(
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/count.py", line 1240, in count
    report_result = render_report(
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/report.py", line 279, in render_report
    execute_report(temp_path, nb_path, html_path) 
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/kb_python/report.py", line 232, in execute_report
    ep.preprocess(nb)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py", line 100, in preprocess
    self.preprocess_cell(cell, resources, index)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py", line 121, in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/jupyter_core/utils/__init__.py", line 166, in wrapped
    return loop.run_until_complete(inner)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/nbclient/client.py", line 1021, in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
  File "/fs/home/yeroslaviz/miniconda3/lib/python3.9/site-packages/nbclient/client.py", line 915, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
adata = import_matrix_as_anndata('/fs/pool/pool-cox-projects-bioinformatics/Dept_Tachibana/Laura/P571_scRNASeq_Embryos/Kallisto_Quant/cre_pos_9/counts_unfiltered/cells_x_genes.mtx', '/fs/pool/pool-cox-projects-bioinformatics/Dept_Tachibana/Laura/P571_scRNASeq_Embryos/Kallisto_Quant/cre_pos_9/counts_unfiltered/cells_x_genes.barcodes.txt', '/fs/pool/pool-cox-projects-bioinformatics/Dept_Tachibana/Laura/P571_scRNASeq_Embryos/Kallisto_Quant/cre_pos_9/counts_unfiltered/cells_x_genes.genes.txt', t2g_path='GRCm39/kallisto/t2g.txt')

# Filter barcodes and UMIs with 0 counts
sc.pp.filter_cells(adata, min_genes=1e-3)
sc.pp.filter_cells(adata, min_counts=1e-3)
n_counts = adata.obs['n_counts']
n_genes = adata.obs['n_genes']

# Run PCA
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
pca = PCA(n_components=10)
pc = pca.fit_transform(adata.X.todense())
------------------
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 13
     11 sc.pp.log1p(adata)
     12 pca = PCA(n_components=10)
---> 13 pc = pca.fit_transform(adata.X.todense())

File ~/miniconda3/lib/python3.9/site-packages/sklearn/utils/_set_output.py:142, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    140 @wraps(f)
    141 def wrapped(self, X, *args, **kwargs):
--> 142     data_to_wrap = f(self, X, *args, **kwargs)
    143     if isinstance(data_to_wrap, tuple):
    144         # only wrap the first output for cross decomposition
    145         return (
    146             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    147             *data_to_wrap[1:],
    148         )

File ~/miniconda3/lib/python3.9/site-packages/sklearn/decomposition/_pca.py:462, in PCA.fit_transform(self, X, y)
    439 """Fit the model with X and apply the dimensionality reduction on X.
    440
    441 Parameters
   (...)
    458 C-ordered array, use 'np.ascontiguousarray'.
    459 """
    460 self._validate_params()
--> 462 U, S, Vt = self._fit(X)
    463 U = U[:, : self.n_components_]
    465 if self.whiten:
    466     # X_new = X * V / S * sqrt(n_samples) = U * sqrt(n_samples)

File ~/miniconda3/lib/python3.9/site-packages/sklearn/decomposition/_pca.py:485, in PCA._fit(self, X)
    479 if issparse(X):
    480     raise TypeError(
    481         "PCA does not support sparse input. See "
    482         "TruncatedSVD for a possible alternative."
    483     )
--> 485 X = self._validate_data(
    486     X, dtype=[np.float64, np.float32], ensure_2d=True, copy=self.copy
    487 )
    489 # Handle n_components==None
    490 if self.n_components is None:

File ~/miniconda3/lib/python3.9/site-packages/sklearn/base.py:546, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    544     raise ValueError("Validation should be done on X, y or both.")
    545 elif not no_val_X and no_val_y:
--> 546     X = check_array(X, input_name="X", **check_params)
    547     out = X
    548 elif no_val_X and not no_val_y:

File ~/miniconda3/lib/python3.9/site-packages/sklearn/utils/validation.py:737, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    646 """Input validation on an array, list, sparse matrix or similar.
    647
    648 By default, the input is checked to be a non-empty 2D array containing
   (...)
    734     The converted and validated array.
    735 """
    736 if isinstance(array, np.matrix):
--> 737     raise TypeError(
    738         "np.matrix is not supported. Please convert to a numpy array with "
    739         "np.asarray. For more information see: "
    740         "https://numpy.org/doc/stable/reference/generated/numpy.matrix.html"
    741     )
    743 xp, is_array_api = get_namespace(array)
    745 # store reference to original array to check if copy is needed when
    746 # function returns

TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
Yenaled commented 1 year ago

This is a limitation currently: There's no way to create a report for several technologies (smart-seq included). I will attempt to address this in the next release of kallisto/bustools/kb-python.

That said, you can obtain most of the the information in the report by looking at the outputted json files in the output directory. It will give you metrices like percent reads mapped, unique mappings, etc. For additional QC, I recommend loading the matrices into python or R and plot some metrics (num reads for each cell, num genes detected per cell, etc.) as well as look at fastqc metrices for sequence-level QC.

yeroslaviz commented 1 year ago

ok, thanks for a great tool.

it would be great to get the report option as well in the future.