kmayerb / tcrdist2

Quantifiable predictive features define epitope-specific T cell receptor repertoires
Other
14 stars 4 forks source link

10x genomics Cellranger output #7

Closed AnaHowellsFerreira closed 4 years ago

AnaHowellsFerreira commented 5 years ago

Hi Koshlan!

I have more of an inquiry rather than an issue for you. First of all, thank you for publishing and maintaining TCRdist2. I came across Bradley's paper a few months ago and I have been digging deeper into the subject.

I am working on a multimodal single cell pipeline, and one of the outputs generated comes from 10x genomics Cellranger vdj pipeline (different format than your mappers.py handles, as far I can tell). Am I missing anything or you don't offer that option? Before I go trying to tweak your code on my local machine, I thought I should do a sanity check.

Thank you very much!

Ana

kmayerb commented 5 years ago

Hi Ana,

Thanks for reaching out. TCRdist2 a development version that builds upon concepts in Dash et al. (Phil Bradley) paper (which has its own repository: https://github.com/phbradley/tcr-dist).

TCRdist2 is beind designed with functionality to support users who may be generating TCR data from adaptive and 10X pipelines. You are correct that we don't currently have a 10x Cellranger mapper, but it would be a great addition.

You should not need to tweak the tcrdist2 code; however, to write a new mapper:

td.mappers.vdjdb_to_tcrdist2(pd_df = pd_df)  

just takes a DataFrame as input and returns a reformatted DataFrame with the following columns:

Index(['cdr3_a_aa', 'cdr3_b_aa', 'cell_type', 'count', 'epitope', 'epitope_aa',
       'frequency', 'id', 'j_a_gene', 'j_b_gene', 'mhc_a_a', 'mhc_a_b',
       'mhc_b_a', 'mhc_b_b', 'organism', 'subject', 'v_a_gene', 'v_b_gene'],
      dtype='object')

A standalone mapper function will work fine if it does the same.

If you do write a mapper, feel free to submit a pull request with a test based on some real or fake cell ranger input data, and we can review it and potentially add it to our current mappers.py module.

Or feel free to be in touch if you have other questions or suggestions.

Best, Kosh

AnaHowellsFerreira commented 5 years ago

Great! Thank you for all the information. I will definitely work on the mappers for 10x and let you know.

I do have a question. I will be working with tumor infiltrating lymphocytes and I was wondering if I need epitope and MHC information. My apologies if this sounds like a basic question, I am new to the TCR world and bioinformatics.

Ana

kmayerb commented 4 years ago

Some aspects of tcrdist2 such as calculating distances between all TCR sequences and plotting gene usage in with a sankey diagram do not explicitly require epitope or MHC information. You could leave those fields will null values.