Closed AnaHowellsFerreira closed 4 years ago
Hi Ana,
Thanks for reaching out. TCRdist2 a development version that builds upon concepts in Dash et al. (Phil Bradley) paper (which has its own repository: https://github.com/phbradley/tcr-dist).
TCRdist2 is beind designed with functionality to support users who may be generating TCR data from adaptive and 10X pipelines. You are correct that we don't currently have a 10x Cellranger mapper, but it would be a great addition.
You should not need to tweak the tcrdist2 code; however, to write a new mapper:
td.mappers.vdjdb_to_tcrdist2(pd_df = pd_df)
just takes a DataFrame as input and returns a reformatted DataFrame with the following columns:
Index(['cdr3_a_aa', 'cdr3_b_aa', 'cell_type', 'count', 'epitope', 'epitope_aa',
'frequency', 'id', 'j_a_gene', 'j_b_gene', 'mhc_a_a', 'mhc_a_b',
'mhc_b_a', 'mhc_b_b', 'organism', 'subject', 'v_a_gene', 'v_b_gene'],
dtype='object')
A standalone mapper function will work fine if it does the same.
If you do write a mapper, feel free to submit a pull request with a test based on some real or fake cell ranger input data, and we can review it and potentially add it to our current mappers.py module.
Or feel free to be in touch if you have other questions or suggestions.
Best, Kosh
Great! Thank you for all the information. I will definitely work on the mappers for 10x and let you know.
I do have a question. I will be working with tumor infiltrating lymphocytes and I was wondering if I need epitope and MHC information. My apologies if this sounds like a basic question, I am new to the TCR world and bioinformatics.
Ana
Some aspects of tcrdist2 such as calculating distances between all TCR sequences and plotting gene usage in with a sankey diagram do not explicitly require epitope or MHC information. You could leave those fields will null values.
Hi Koshlan!
I have more of an inquiry rather than an issue for you. First of all, thank you for publishing and maintaining TCRdist2. I came across Bradley's paper a few months ago and I have been digging deeper into the subject.
I am working on a multimodal single cell pipeline, and one of the outputs generated comes from 10x genomics Cellranger vdj pipeline (different format than your mappers.py handles, as far I can tell). Am I missing anything or you don't offer that option? Before I go trying to tweak your code on my local machine, I thought I should do a sanity check.
Thank you very much!
Ana