Closed s2hui closed 3 years ago
Hi,
Assuming the barcode suffixes match between the clones files and the gex matrix, you could run each sample individually since any cells without tcr information would be excluded. But if you’d like to run it all together then
I’d recommend using the make_10x_clones_file_batch
function (which we need to document better). You can use a metadata csv file with two columns, “file” containing the paths to each of the filtered_contig_annotations.csv files, and “suffix” which should contain the
barcode suffix for matching that sample to the corresponding cells in your aggregate gex matrix. This will merge each of the files into a single output clones file that can then be passed to run_conga.py
with your gex file.
Hi, Thanks for this info.
Re: running all together
The barcodes in my integrated data set have a prefix (not suffix) i.e. prefix_barcode-1
, so I'm guessing the batch function would not work in this case?
Re: running individually
I've preprocessed my file so that many cells have been removed. If there are cells (barcode) in the tcr clone file but they aren't in the gex file, would that cause problems? If not could I run individually but supply the integrated gex file for each of the clone files when I run merge_samples.py
? This assumes that the clone mapping file contains the matching prefixed- barcodes (i.e. the barcodes in the clones match the barcodes in the integrated gex file)
So the sample.txt file would look like:
clones_file gex_data gex_data_type
tcr_cloneA.tsv integrated_mtx_dir 10x_mtx
tcr_cloneB.tsv integrated_mtx_dir 10x_mtx
tcr_cloneC.tsv integrated_mtx_dir 10x_mtx
...
Hi.
Regarding your second question, I would not recommend using merge_samples.py
this way. What I meant previously is you could run run_conga.py
using one clones file and the integrated gex file like this:
python run_conga.py --gex_data integrated_mtx_dir --gex_data_type 10x_mtx --clones_file tcr_cloneA.tsv --organism human --outfile_prefix ./outdir/prefix
I would still recommend aggregating your clones together first before merging it with your gex file. I've modified the make_10x_clones_batch function (make sure to pull the latest commit) so that either a prefix or suffix can be stripped/appended to the barcode, but it's hard to imagine all the possible configurations and make this all-encompassing so you will need to modify your barcodes either in the GEX or filtered_contig_annotations file prior to parsing.
Assuming the barcodes in the filtered_contigannotations look like "barcode-1", perhaps the simplest way is to modify the barcodes in your gex file by stripping off the '-1' and replacing the "" between the prefix and barcode with "-" so they look like this "prefix-barcode". Then specify the appropriate prefixes in the "batch_id" column (this changed in the new commit) of the metadata file and use the following to get the tcr barcodes into the same configuration:
metadata_file = 'metadata.csv'
organism = 'human'
clones_file = 'clones.tsv'
make_10x_clones_file_batch( metadata_file, organism, clones_file, strip_batch_id_location = 'suffix', add_batch_id_location = 'prefix')
Hi,
Thanks for detailed instructions!
I renamed the barcodes in my integrated Seurat object to be of the format: prefix-barcode (stripped off the -1 at the end and replaced the _ with a - in between the prefix and barcode). Then I converted the object into mtx using the write10xCounts
method within R.
Then I made a batch.csv file with two columns (format below):
file,batch_id filtered_contig_annotations.csv,prefix
Then I ran make_10x_clones_file_batch
but get the following error:
Python 3.7.2 (default, Dec 29 2018, 06:19:36) [GCC 7.3.0]
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
62 logical CPU cores, x86_64
-----
Session information updated at 2021-10-18 17:37
reading: /cluster/projects/finelligroup/scKidneyCancer/out_shirley/conga/out/mtx of type 10x_mtx
total barcodes: 113730 (113730, 34271)
reading: /cluster/projects/finelligroup/scKidneyCancer/out_shirley/conga/out/merged_remedy/merged_remedy_clones.tsv
reading: /cluster/projects/finelligroup/scKidneyCancer/out_shirley/conga/out/merged_remedy/merged_remedy_clones_AB.dist_50_kpcs
Reducing to the 0 barcodes (out of 113730) with paired TCR sequence data
Traceback (most recent call last):
File "/cluster/home/hshirley/conga/scripts/run_conga.py", line 378, in <module>
suffix_for_non_gene_features = args.suffix_for_non_gene_features,
File "/cluster/home/hshirley/conga/conga/preprocess.py", line 393, in read_dataset
store_tcrs_in_adata( adata, tcrs )
File "/cluster/home/hshirley/conga/conga/preprocess.py", line 168, in store_tcrs_in_adata
adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower()
File "/cluster/home/hshirley/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5456, in __getattr__
return object.__getattribute__(self, name)
File "/cluster/home/hshirley/.local/lib/python3.7/site-packages/pandas/core/accessor.py", line 180, in __get__
accessor_obj = self._accessor(obj)
File "/cluster/home/hshirley/.local/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 154, in __init__
self._inferred_dtype = self._validate(data)
File "/cluster/home/hshirley/.local/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 218, in _validate
raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
I'm not sure if it is the cause but when I look inside the resulting aggregated clones mappings file, I see that all the prefix-barcodes have -1-0 appended, so it looks like:
prefix-barcode-1-0
This format doesn't match any barcodes in the gex data file as the barcodes were renamed to be prefix_barcode.
I wonder what the issue is? Thanks for your help! @s2hui
Hi @s2hui,
The error is due to the misalignment of barcodes between the clones file and the GEX matrix. Could you share the details of the make_10x_clones_file_batch
command you ran as well as the first 5 or so barcodes from one of the filtered_contig_annotations.csv files and the integrated GEX matrix prior to the merger?
Hi, I went over my call to run conga and indeed I wasn't using the correct gex data file! It is working now after I followed all your steps above. I appreciate all your help! @s2hui
Hi there, For some reason I am having trouble seeing the context for this email on github. Is this a post on one of the open issues, or is it an email directly to me? So, when you say below, "I performed the procedure as above" I can't figure out what that refers to. Maybe include a bit more context or let me know which issue it is? When I click on issue #28 (from the subject line) I don't see the post. Take care, Phil
From: leeanapeters @.> Sent: Tuesday, August 2, 2022 12:10 PM To: phbradley/conga @.> Cc: Subscribed @.***> Subject: Re: [phbradley/conga] Question about using previously integrated gex data (#28)
Hi I am having an issue with combining keeping the batch parameters to compare groups and performing the make_10x_clones_file_batch merge.
I performed the procedure as above but then modified my adata object using a batch_info file (containing barcodes and other metadata ie patient and condition) and set those as the batch keys.
When I attempt to use the clone file generated from the make_10x_clones_file_batch along with the gex exported from the adata object, I receive this error even though the mapping file is in the same directory assert exists(kpca_file) AssertionError.
When I use --no_kpca i receive this error: total barcodes: 18863 (18863, 18327) reading: all_clones_w_prefix_added.tsv WARNING: missing kpca_file: all_clones_w_prefix_added_AB.dist_50_kpcs WARNING: X_tcr_pca will be empty Reducing to the 0 barcodes (out of 18863) with paired TCR sequence data
Any help would be appreciated!
Leeana
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_phbradley_conga_issues_28-23issuecomment-2D1203110539&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=yn0YtuJvzxLiQBJ_FSV_95HYJe_UM8Ih76MLDE7eHykqJAO4CPuRzbrzT38vf-e5&s=IGFMe4ODSZd0WjKyndgSVVMuAW0geugVCcDDTiNBxrM&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABBNCH25S6QHK47AIZGCZ6LVXFXDXANCNFSM5F3GDHAA&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=yn0YtuJvzxLiQBJ_FSV_95HYJe_UM8Ih76MLDE7eHykqJAO4CPuRzbrzT38vf-e5&s=Wz1XmbzxZyJ7glzCIqv4EBbxxjpNTXZ5Z9Q_7xFm3F4&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello,
In the example on merging multiple data sets, the individual sample clones and companion gex are supplied in a txt file and inputted into the merge_samples.py script.
What I have are the individual clone files and one integrated gex file. What would be appropriate way to go about analyzing my data?
Would it make sense to supply the converted integrated gex data (i.e. mtx or h5 format) as the companion file for each clone in the sample.txt file?
Appreciate your insight, @s2hui