scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
212 stars 34 forks source link

Supported T cell receptor types [REPLACEMENT ISSUE] #10

Closed grst closed 4 years ago

grst commented 4 years ago

The original issue

Id: 10
Title: Supported T cell receptor types

could not be created. This is a dummy issue, replacing the original one. It contains everything but the original issue description. In case the gitlab repository is still existing, visit the following link to show the original issue:

TODO

grst commented 4 years ago

In GitLab by @grst on Jan 24, 2020, 10:10

10x files:

We use the file filtered_contig_annotations.csv.

# Human B cell chains
(default) sturm@hochvogel Downloads % cut -f6 -d, vdj_v1_hs_pbmc3_b_filtered_contig_annotations.csv | sort | uniq -c
      1 chain
    929 IGH
    624 IGK
    506 IGL

# Human T cell chains
(default) sturm@hochvogel Downloads % cut -f6 -d, vdj_v1_hs_pbmc3_t_filtered_contig_annotations.csv | sort | uniq -c
      1 chain
     46 Multi
   4907 TRA
   5168 TRB

# Mouse B cell chains
(default) sturm@hochvogel Downloads % cut -f6 -d, vdj_v1_mm_pbmc4_b_filtered_contig_annotations.csv| sort | uniq -c 
      1 chain
   5215 IGH
   5475 IGK
   2573 IGL

# Mouse T cell chains
(default) sturm@hochvogel Downloads % cut -f6 -d, vdj_v1_mm_pbmc4_t_filtered_contig_annotations.csv| sort | uniq -c
      1 chain
      7 Multi
    761 TRA
   1301 TRB

There are indeed a bunch of barcodes that have more than 4 chains, e.g.

TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_1  True             648     TRB    TRBV19        None    TRBJ2-1   TRBC2   True         True        CASSISTDWGNEQFF              
TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_2  True             511     TRA    TRAV23/DV6    None    TRAJ58    TRAC    True         True        CAASQETSGSRLTF               
TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_3  True             521     TRB    TRBV6-5       None    TRBJ2-1   TRBC2   True         True        CASSYRTGSSYNEQFF             
TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_4  True             659     TRA    TRAV8-6       None    TRAJ6     TRAC    True         True        CAVNPGGSYIPTF                
TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_5  True             427     TRB    TRBV7-8       None    TRBJ2-1   TRBC2   True         False       CQQLRKTSYNEQFF               
TTTACTGTCACCAGGC-1  True     TTTACTGTCACCAGGC-1_contig_6  True             463     TRA    TRAV13-1      None    TRAJ4     TRAC    True         False       CSKFLFSGGYNKLIF        

But for those I checked, there are only four that are productive. For now, I think it's fine to just use the productive chains and emit a warning that there might be more.

grst commented 4 years ago

In GitLab by @grst on Jan 24, 2020, 10:55

TraCeR

Files to use:

For now, let's go for TraCeR alpha/beta only.

grst commented 4 years ago

In GitLab by @szabogtamas on Jan 24, 2020, 13:04

Totally agree: even four chains is a lot and we can safely assume that there shouldn't be more than four productive chains in a cell. Since we only have datasets with alpha/beta, I would also leave gamma/delta for now. We can include them later.

grst commented 4 years ago

In GitLab by @grst on Feb 14, 2020, 16:18

alpha/beta is fine for now. But this needs to be documented.

grst commented 4 years ago

In GitLab by @grst on Mar 27, 2020, 11:31

assigned to @szabogtamas