Open QianhuiXu opened 1 year ago
Hi there, thanks for trying conga, and thanks for the feedback. This error suggests that the list "all_data" is empty, which may be because the preceding loop did not execute. The loop was over the files found by the glob command
gex_datasets = sorted(glob.glob('*-CD3'))
Could you check and see whether the expected files are present and in the directory where the notebook is running? These would be the *-CD3 folders that have the GEX counts data in them.
Thank you for your help! I have solved this error by changing the reading directory: gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/-CD3')) But I got another issue in the next step, I have put these -gdTCR_filtered_contig_annotations.csv files in the reading directory('/home/shpc_100668/conga/GSE144469_RAW/').
My command : gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3')) diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv' all_contigs = pd.read_csv(contigs_file) all_data = [] for donor_num, gex_dir in enumerate(gex_datasets): donor = gex_dir.split('-')[0] donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy() donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1' donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv' donor_contigs.to_csv(donor_contigs_file) donor_clones_file = f'{donor}_abtcr_clones.tsv' make_10x_clones_file( donor_contigs_file, organism = 'human', # using 'human' for TCRab clones_file = donor_clones_file, stringent = True, # (the default) see Note #1 on clonotype filtering ) adata = conga.preprocess.read_dataset( gex_dir, '10x_mtx', donor_clones_file, allow_missing_kpca_file=True) disease = donor[:-1] adata.obs['disease'] = disease adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers adata.obs['donor'] = donor adata.obs['donor_int'] = donor_num all_data.append( adata ) new_adata = all_data[0].concatenate(all_data[1:]) new_adata.write('merged_gex_abtcr.h5ad')
AttributeError Traceback (most recent call last)
/tmp/ipykernel_2715303/7264258.py in
~/conga/conga/preprocess.py in read_dataset(gex_data, gex_data_type, clones_file, make_var_names_unique, keep_cells_without_tcrs, kpca_file, allow_missing_kpca_file, gex_only, suffix_for_non_gene_features) 403 404 tcrs = [ barcode2tcr[x] for x in adata.obs.index ] --> 405 store_tcrs_in_adata( adata, tcrs ) 406 407 return adata
~/conga/conga/preprocess.py in store_tcrs_in_adata(adata, tcrs) 178 179 # ensure lower case --> 180 adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower() 181 adata.obs['cdr3b_nucseq'] = adata.obs.cdr3b_nucseq.str.lower() 182
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls) 179 # we're accessing the attribute of the class, i.e., Dataset.geo 180 return self._accessor --> 181 accessor_obj = self._accessor(obj) 182 # Replace the property with the accessor object. Inspired by: 183 # https://www.pydanny.com/cached-property.html
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in init(self, data) 166 from pandas.core.arrays.string_ import StringDtype 167 --> 168 self._inferred_dtype = self._validate(data) 169 self._is_categorical = is_categorical_dtype(data.dtype) 170 self._is_string = isinstance(data.dtype, StringDtype)
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in _validate(data) 223 224 if inferred_dtype not in allowed_types: --> 225 raise AttributeError("Can only use .str accessor with string values!") 226 return inferred_dtype 227
AttributeError: Can only use .str accessor with string values!
Thank you for your kind help!
Hello,
conga is a wonderful tool!
I ran into an issue with explore fancy_conga_pipeline_with_batches_and_gammadelta_tcrs notebook.
My command : gex_datasets = sorted(glob.glob('*-CD3')) diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv' all_contigs = pd.read_csv(contigs_file) all_data = [] for donor_num, gex_dir in enumerate(gex_datasets):
The folder name is also the donor ID
new_adata = all_data[0].concatenate(all_data[1:]) new_adata.write('merged_gex_abtcr.h5ad')
Error: IndexError Traceback (most recent call last) /tmp/ipykernel_1354605/1967687937.py in
33
34 # concatenate the datasets
---> 35 new_adata = all_data[0].concatenate(all_data[1:])
36 #save the aggregate AnnData object
37 new_adata.write('merged_gex_abtcr.h5ad')
IndexError: list index out of range
I'm really at a loss as to how to proceed, and any guidance would be much appreciated! Thank you for your kind help!