phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
83 stars 19 forks source link

AttributeError: Can only use .str accessor with string values! #24

Closed s2hui closed 3 years ago

s2hui commented 3 years ago

Hello, Thanks for making this useful tool! I am trying to get it working on my own data set but ran into some errors. I followed the instructions and converted my preprocessed Seurat object as:

require(Seurat)
require(DropletUtils)
hs1 <- readRDS('~/scdata.rds')
If the object contains only gene expression:

write10xCounts(x = hs1@assays$RNA@counts, path = './hs1_mtx/')

Can you confirm that if I have access to the original mtx file that I still need to do the above? The rds file I am using has been filtered/preprocessed (so has fewer cells than the original mtx).

Then I ran setup as follows:

python ~/conga/scripts/setup_10x_for_conga.py --filtered_contig_annotations_csvfile filtered_contig_annotations.csv --output_clones_file out-clones.tsv --organism human

I see these files created:

out-clones.tsv.barcode_mapping.tsv
out-clones.tsv
out-clones_AB.dist_50_kpcs

Then I called run_conga as follows:

python conga/scripts/run_conga.py --all --gex_data conga_mtx/hs1_mtx/ --gex_data_type 10x_mtx --clones_file out-clones.tsv --organism human --outfile_prefix tmp_out-clones

and get the following output. I was hoping to get some help to figure out how to get rid of the error. Thanks for your help! @s2hui

--all implies --graph_vs_graph ==> Running graph_vs_graph analysis.
--all implies --graph_vs_graph_stats ==> Running graph_vs_graph_stats analysis.
--all implies --graph_vs_features ==> Running graph_vs_features analysis.
--all implies --cluster_vs_cluster ==> Running cluster_vs_cluster analysis.
--all implies --find_hotspot_features ==> Running find_hotspot_features analysis.
--all implies --find_gex_cluster_degs ==> Running find_gex_cluster_degs analysis.
--all implies --tcr_clumping ==> Running tcr_clumping analysis.
--all implies --match_to_tcr_database ==> Running match_to_tcr_database analysis.
--all implies --make_tcrdist_trees ==> Running make_tcrdist_trees analysis.
WARNING: If you miss a compact list, please try `print_header`!
The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info.
-----
anndata     0.7.5
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.3.1
cairocffi           1.2.0
cffi                1.12.2
colorama            0.4.1
conga               NA
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.0
defusedxml          0.7.1
h5py                2.10.0
igraph              0.8.3
joblib              0.14.1
kiwisolver          1.0.1
leidenalg           0.8.3
llvmlite            0.32.0
matplotlib          3.4.3
mpl_toolkits        NA
natsort             7.1.0
numba               0.49.0
numexpr             2.7.3
numpy               1.19.5
packaging           20.8
pandas              1.2.0
pkg_resources       NA
psutil              5.7.0
pyparsing           2.3.1
pytz                2019.3
scipy               1.4.1
six                 1.12.0
sklearn             0.22.2.post1
statsmodels         0.12.2
tables              3.6.1
texttable           1.6.3
typing_extensions   NA
wcwidth             0.2.5
yaml                5.1.1
zipp                NA
-----
Python 3.7.2 (default, Dec 29 2018, 06:19:36) [GCC 7.3.0]
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
38 logical CPU cores, x86_64
-----
Session information updated at 2021-09-28 10:36

reading: conga_mtx/hs1_mtx/ of type 10x_mtx
conga/conga/preprocess.py:221: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
  if adata.isview: # ran into trouble with AnnData views vs copies
total barcodes: 5557 (5557, 33538)
reading: out-clones.tsv
reading: out-clones_AB.dist_50_kpcs
Reducing to the 0 barcodes (out of 5557) with paired TCR sequence data
Traceback (most recent call last):
  File "conga/scripts/run_conga.py", line 378, in <module>
    suffix_for_non_gene_features = args.suffix_for_non_gene_features,
  File "conga/conga/preprocess.py", line 393, in read_dataset
    store_tcrs_in_adata( adata, tcrs )
  File "conga/conga/preprocess.py", line 168, in store_tcrs_in_adata
    adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower()
  File ".local/lib/python3.7/site-packages/pandas/core/generic.py", line 5456, in __getattr__
    return object.__getattribute__(self, name)
  File ".local/lib/python3.7/site-packages/pandas/core/accessor.py", line 180, in __get__
    accessor_obj = self._accessor(obj)
  File ".local/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 154, in __init__
    self._inferred_dtype = self._validate(data)
  File ".local/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 218, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
sschattgen commented 3 years ago

Hi,

Thank you for your interest and for trying out the software. To answer your first question, you do not need to export the matrix from Seurat if you have access to the cellranger outputs. Either the cellranger mtx files or the h5 matrix file can be used. Regarding your error, it seems none of the cell barcodes in your GEX matrix aligned to those in your out-clones.tsv file. One common reason for this is a discrepancy between the barcode suffixes in the GEX matrix and the filtered_contig_annotations.csv file. Can you confirm if this is the case? Using the pre-Seurat GEX matrix instead of the exported one may fix the issue if the suffixes were changed after you processed it through Seurat.

s2hui commented 3 years ago

Hello, Thanks for your reply. As it turns out, I wasn't using the correct gex matrix that matched the filtered_contig_annotations.csv file! It works now!