Closed afcmalone closed 2 years ago
Hi Andrew, thanks for trying CoNGA and thanks in advance for your patience as we get this figured out. It looks like it can't find the clones file. Can you double-check that it exists at the location you are passing in (/home/malone/Andrew/mgi_vdj/Bx68_B1_vdj_t_outs/filtered_contig_annotations_tcrdist_clones.tsv)? Note that there are two steps: running setup_10x_for_conga.py which makes the clones file and then running run_conga.py. Partly for historical reasons and partly so that you can check out the TCR clonotype parsing and confirm that they seem reasonable. The README has examples of this near the top.
Thank you for your quickly reply. I realized this mistake as soon as i messaged you! The pipeline works now, however, i have run into some other issues. My data is very sparse particularly on the TCR clone side of things. Is there a minimum number of clones or sizes of clones needed to get a complete set of results? e.g, I have many different clones but most clones are n=1 or 2. Thanks
Hi Andrew. The size of the clone is irrelevant since each clone is being represented by a single cell from the lineage. Based on what you're describing, I suspect the issue is the conga cluster sizes in your results are below the threshold for generating the full graphical output for the graph-vs-graph analysis. The default is set to 5 but can be specified using --min_cluster_size
argument. Since you said it's a small dataset you could try dropping it to the 2 (the minimum size that can be used) and see if that works. If not, we may need a little more information about the error.
This worked, thank you.
Conga is a wealth of information! - One more question. Is the conga_final.h5ad object in a format that can be used in scanpy like a regular single cell RNA-seq anndata object. e.g. to make GEX violin plots etc?
I would say yes, with the caveat that each clonotype has been reduced to a single representative cell. So for expanded clones there is some loss of information. If you add the option --average_clone_gex to the run_conga.py script then the GEX information for that representative cell will be the average of the GEX information for all the cells in the clone. The _final.h5ad object has both the raw counts data (normalized to sum to 10000 for each cell and then log-transformed) in the adata.raw.X matrix and also the scaled and regressed GEX data for just the highly variable genes in the adata.X matrix. Anyhow, it should be basically OK for scanpy GEX type analyses.
Thanks for your kind words re conga. Don't forget to try the --all option to run_conga.py if you haven't already, to get even more output!
Just echoing Phil here, but the conga_final.h5ad object is compatible with scanpy's plotting and statistical tools, though we haven't exhaustively tried all of them. It's possible to generate and save an object with all the cells prior to reducing it down to single-cell per clonotype for conga analysis so in the end, you would have two objects that were processed through the same pipeline which might be preferable. This is currently not possible using the run_conga script but it can be easily done interactively. Here is an example of how to you could make the two objects through the conga workflow that can be used with scanpy.
This is working great for me now. really powerful tool! i am now trying to include ADT/CITEseq data into the pipeline by adding --include_protein_features
my clone data files are from 10X VDJ T cell V2.
I ran: ../anaconda3/envs/conga_new_env/bin/python conga/scripts/run_conga.py --graph_vs_graph --gex_data ../Andrew/conga_data/hs1_mtx --gex_data_type 10x_mtx --clones_file ../Andrew/conga_data/citeseq_t_contigs_filtered_contig_annotations_tcrdist_clones.tsv --organism human --outfile_prefix ../Andrew/conga_data/tmp_hs_citeonly_conga_clone5 --min_cluster_size 5 --include_protein_features
however i am getting an error:
reading: ../conga_data/hs1_mtx of type 10x_mtx
../python_code/conga_conda/conga/conga/preprocess.py:221: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
if adata.isview: # ran into trouble with AnnData views vs copies
total barcodes: 6978 (6978, 23215)
Traceback (most recent call last):
File "conga/scripts/run_conga.py", line 374, in
Also, I would like to run my 10X VDJ_B_cell (Ig) data using this pipeline. You mention this functionality - 2020-09-04: (EXPERIMENTAL) Not sure how much hacking in need to do to the code or whether i would be successful.
Thank you in advance,
Hi Andrew, I think we can get both of these to work; but, you are right-- they are experimental and haven't been tested very much.
First, though, it looks like it can't find the clones file (same error as before), maybe you can double-check the location for the clones file?
For the B cell stuff, you might look first at the example in the README and see if that can be tweaked to work for your case: https://github.com/phbradley/conga#human-melanoma-b-cell-dataset
Note that in that example, we are defining "clonotypes" for B cells using a bcrdist threshold of 50.
really sorry but i pasted the previous error (now resolved)
error with --include_protein_features : Traceback (most recent call last): File "/home/malone/anaconda3/envs/conga_new_env/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: None
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/malone/python_code/conga_conda/conga/scripts/run_conga.py", line 518, in
Maybe the problem is because I am reading in hs1_mtx files to the conga_run pipeline?
Thanks,
10X matrix files are supported input for the pipeline so this is not the issue. You are getting this error because nothing is assigned as 'Antibody Capture' in the features.tsv file. The typical cellranger output should include this information but I suspect here that you exported the raw count matrix from a Seurat object using the example in the README file. If this was the case, the current example was intended for GEX only and would cause this error. You would need to write out a matrix containing both the gene and antibody labeling counts, and also make sure these are properly labelled in the features.tsv file. Here's an example of how you can do this with the write10xCounts
function in DropletUtils
:
## exporting gene expression and antibody labeling from Seurat to CoNGA
require(Seurat)
require(DropletUtils)
hs1 <- readRDS('./my_seurat_obj.rds')
## concatenate the GEX and antibody labeling count matrices into one matrix
# here, ADT is the antibody labeling assay slot
count_matrix <- rbind(hs1@assays$RNA@counts, hs1@assays$ADT@counts)
# create vector of feature type labels
features <- c(
rep("Gene Expression", nrow(hs1@assays$RNA@counts)),
rep("Antibody Capture", nrow(hs1@assays$ADT@counts))
)
# write out
write10xCounts( count_matrix,
path = './raw/seurat_export/',
gene.id = rownames(count_matrix),
gene.symbol = rownames(count_matrix),
barcodes = colnames(count_matrix),
gene.type = features,
version = "3")
Thanks Stefan, this works nicely. I have been using an integrated seurat object (5 samples) to generate the hs1_mtx as input into the conga_run pipeline. In your updates you added a /merge_samples.py method to run conga on multiple samples. What I did was to manually combine the clones file correcting the barcodes to match the integrated seurat object/hs1_mtx files. Do you think I should use 'hs1@assays$RNA@data' instead of 'hs1@assays$ADT@counts' ? These this matrix is normalized, or would this make no difference once the conga pipeline is run?
Thanks, Andrew
I would stick with the count matrix. It will be normalized and scaled in the pipeline.
I have been doing this. Thanks again for all your answers.
Thanks for being a beta tester! And please do follow up if you run into trouble or have suggestions for improvements/new features.
Hi,
your recent paper was great!
I am failing to get output using run_conga.py.
I get an error: assert exists(clones_file) AssertionError
seems to run ok intially, then: Session information updated at 2021-08-24 14:43
reading: /home/malone/Andrew/conga_data/Bx68/Bx68B1_filtered_feature_bc_matrix.h5 of type 10x_h5 Variable names are not unique. To make them unique, call
adata = conga.preprocess.read_dataset(
File "/home/malone/python_code/conga/conga/conga/preprocess.py", line 296, in read_dataset
assert exists(clones_file)
AssertionError
.var_names_make_unique
. total barcodes: 27073 (27073, 33538) Traceback (most recent call last): File "conga/scripts/run_conga.py", line 367, incommand line is: python conga/scripts/run_conga.py --graph_vs_graph --gex_data /home/malone/Andrew/conga_data/Bx68/Bx68B1_filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file /home/malone/Andrew/mgi_vdj/Bx68_B1_vdj_t_outs/filtered_contig_annotations_tcrdist_clones.tsv --organism human --outfile_prefix tmp_hs_conga
Any ideas why this is occurring??
Thanks