phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
80 stars 18 forks source link

merge_samples errors #40

Closed AlicePsyche closed 2 years ago

AlicePsyche commented 2 years ago

Hello,

Recently I got my own 10x 5' scRNAs-eq data and would like to try with CoNGA. I followed the tutorial and prepared the TCR file by running conga/scripts/setup_10x_for_conga.py with filtered_contig_annotations.csv file generated from 10x. But I had an error when merging two lanes:

reading: lane1_filtered_feature_bc_matrix.h5 of type 10x_h5 Variable names are not unique. To make them unique, call .var_names_make_unique. /home/software/conga/conga/preprocess.py:226: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future. if adata.isview: # ran into trouble with AnnData views vs copies (43691, 20639) lane1_filtered_feature_bc_matrix.h5 Traceback (most recent call last): File "/home/software/conga/scripts/merge_samples.py", line 83, in assert exists(bcmap_file) AssertionError

Given that the hd5 file is from 10x, I am not sure what variable names are not unique? Do I need to do any filtering before merging the samples?

Here is the code I ran: python ~/software/conga/scripts/merge_samples.py --samples samples_info.txt --output_clones_file merged_lanes_clones.tsv --output_gex_data merged_lanes_gex.h5ad --organism human --output_distfile merged_lanes_gex_dist

-rw-r--r-- 1 63413542 Feb 13 12:46 lane1_filtered_feature_bc_matrix.h5 -rw-r--r-- 1 58592469 Feb 13 12:47 lane2_filtered_feature_bc_matrix.h5 -rw-r--r-- 1 169 Feb 13 12:58 samples_info.txt -rw-r--r-- 1 2258946 Feb 13 11:44 vdj_v1_lane1_clones.tsv -rw-r--r-- 1 444699 Feb 13 11:44 vdj_v1_lane1_clones.tsv.barcode_mapping.tsv -rw-r--r-- 1 2518481 Feb 13 11:45 vdj_v1_lane2_clones.tsv -rw-r--r-- 1 475598 Feb 13 11:45 vdj_v1_lane2_clones.tsv.barcode_mapping.tsv

Could you please help me take a look? Thanks a lot in advance!

phbradley commented 2 years ago

Happy to help! Can you post your samples_info.txt file-- looks like it is not able to find the barcode mapping file, but it sure looks like it is there in the directory listing. Maybe the filename for the clones file in the samples_info.txt file is not formatted correctly?

AlicePsyche commented 2 years ago

Thanks!

Here is my sample info:

clones_file gex_data gex_data_type vdj_v1_lane1_clones.tsv lane1_filtered_feature_bc_matrix.h5 10x_h5 vdj_v1_lane2_clones.tsv lane2_filtered_feature_bc_matrix.h5 10x_h5

Tab-separated.

Here is the code I ran to generate the clone files: python ~/software/conga/scripts/setup_10x_for_conga.py --filtered_contig_annotations_csvfile /home/project/Nextseq_lane2/TCR/filtered_contig_annotations.csv --output_clones_file ./vdj_v1_lane1_clones.tsv --organism human --no_kpca --save_tcrdist_matrices &

phbradley commented 2 years ago

Huh, that's a mystery. The error is definitely failure to find the barcode mapping file (line 83 of merge_samples.py). And if you look at the code you can see how the name of the barcode mapping file is created: by adding ".barcode_mapping.tsv" to the name of the clones file. Is it possible there's an extra white-space character in the clones file name? Maybe you could add a print statement before line 83, something like print(bcmap_file) so we can see what's going wrong?

AlicePsyche commented 2 years ago

Hmm, looks like it took the gex_data file as the clones_file?

(conga_new_env) alice@pe2:~/project/CoNGA$ vdj_v1_lane1_clones.tsv.barcode_mapping.tsv reading: lane1_filtered_feature_bc_matrix.h5 of type 10x_h5 Variable names are not unique. To make them unique, call .var_names_make_unique. /home/software/conga/conga/preprocess.py:226: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future. if adata.isview: # ran into trouble with AnnData views vs copies (43691, 20639) lane1_filtered_feature_bc_matrix.h5 vdj_v1_lane2_clones.tsv lane2_filtered_feature_bc_matrix.h5.barcode_mapping.tsv Traceback (most recent call last): File "/home/software/conga/scripts/merge_samples.py", line 84, in assert exists(bcmap_file) AssertionError

phbradley commented 2 years ago

It looks like there's a problem with the samples_info.txt file: missing a tab after the second clones file? See how there is whitespace in the filename that's printed out?

AlicePsyche commented 2 years ago

Oh thank you! It seemed that the problem is caused when running setup_10x_for_conga.py. I put the extra./ here --output_clones_file ./vdj_v1_lane2_clones.tsv After I reran the code, the merge_samples.py worked well. Thanks a lot!