phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

run_conga.py error = UnicodeEncodeError: 'ascii' codec can't encode character '\xd7' in position 26: ordinal not in range(128) #51

Open shimbrianh opened 1 year ago

shimbrianh commented 1 year ago

I am attempting to run the human PBMC example analysis and am running into an error when I run the script run_conga.py.

I have downloaded the datasets using the "conga_example_datasets_v1" Dropbox folder here.

I run the script setup_10x_for_conga.py with the appropriate filtered_contig_annotations.csv file and run into no errors and the appropriate output files.

However, when I run the script run_conga.py, I am greeted with the following error:

Session information updated at 2022-09-23 18:31

reading: ./conga/conga_example_datasets_v1/vdj_v1_hs_pbmc3_5gex_filtered_gene_bc_matrices_h5.h5 of type 10x_h5 Variable names are not unique. To make them unique, call .var_names_make_unique. ./conga/conga/preprocess.py:233: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future. if adata.isview: # ran into trouble with AnnData views vs copies total barcodes: 7231 (7231, 33555) reading: ./conga/conga_example_datasets_v1/vdj_v1_hs_pbmc3_t_filtered_contig_annotations_tcrdist_clones.tsv reading: ./conga/conga_example_datasets_v1/vdj_v1_hs_pbmc3_t_filtered_contig_annotations_tcrdist_clones_AB.dist_50_kpcs Reducing to the 3176 barcodes (out of 7231) with paired TCR sequence data Traceback (most recent call last): File "./conga/scripts/run_conga.py", line 483, in print(adata) UnicodeEncodeError: 'ascii' codec can't encode character '\xd7' in position 26: ordinal not in range(128)

This identical error replicates after multiple attempts to run run_conga.py and after re-runs of setup_10x_for_conga.py. The error also replicates when using data downloaded individually using the link here.

Below is my command (copied from the example) for running run_conga.py:

python ./conga/scripts/run_conga.py --all --organism human --clones_file ./conga/conga_example_datasets_v1/vdj_v1_hs_pbmc3_t_filtered_contig_annotations_tcrdist_clones.tsv --gex_data ./conga/conga_example_datasets_v1/vdj_v1_hs_pbmc3_5gex_filtered_gene_bc_matrices_h5.h5 --gex_data_type 10x_h5 --outfile_prefix tcr_hs_pbmc

Thank you for any advice!