Closed Lihua1990 closed 3 years ago
Lihua,
This error is due to the input not being correctly formatted. So there are no distances to compute: V and J gene names must have allele number so that we can infer CDR1, CDR2, CRD2.5 (aka PMHC). This can easily be fixed in your input df
df['v_d_gene'] = df['v_d_gene'].apply(lambda x : f"{x}*01")
df['j_d_gene'] = df['j_d_gene'].apply(lambda x : f"{x}*01")
also make sure to only pass in relevant columns as a NA in any column of the cell df will cause you to lose that row.
tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']],
organism = 'human',
chains = ['delta'],
db_file = 'alphabeta_gammadelta_db.tsv')
Hi,
Thank you so much for the reply. I still have another question, you mentioned that a NA value in any column of the cell of the dataframe will cause to lose that row. In my dataframe, there are 10% of the rows that have NA value in the column 'd_d_gene', other 90% of the d_d_gene do have a defined value, such as 'TRDD3', 'TRDD2' or 'TRDD1'. What do you suggest that I deal with this 'd_d_gene' column? Should I convert all those have a defined value to df['d_d_gene'] = df['d_d_gene'].apply(lambda x : f"{x}*01")
? Is there a way to also include those rows that do have a NA value in the 'd_d_gene' column?
Thank you so much!
Best, Lihua
D genes are not used by tcrdist3, so you should not include that column when you initialize the TCRrep instance.
You can specify only the columns you need here:
tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']],
organism = 'human',
chains = ['delta'],
db_file = 'alphabeta_gammadelta_db.tsv',)
D genes are not used by tcrdist3, so you should not include that column when you initialize the TCRrep instance.
You can specify only the columns you need here:
tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']], organism = 'human', chains = ['delta'], db_file = 'alphabeta_gammadelta_db.tsv',)
OK, clear now, thanks a lot!
Hi, I am using tcrdist3 and I encountered the error message as titled in this issue.
here's my dataframe:
| subject | count | v_d_gene | d_d_gene | j_d_gene | cdr3_d_aa | cdr3_d_nucseq | -- | -- | -- | -- | -- | -- | -- CB2**afs2 | 369 | TRDV2 | . | TRDJ1 | CACDTGGYTDKLIF | TGTGCCTGTGACACTGGGGGATACACCGATAAACTCATCTTT CB2**afs2 | 335 | TRDV2 | . | TRDJ1 | CACDTGGYTDKLIF | TGTGCCTGTGACACCGGGGGATACACCGATAAACTCATCTTT CB2**afs2 | 214 | TRDV2 | . | TRDJ3 | CACDWGSSWDTRQMFF | TGTGCCTGTGACTGGGGGAGCTCCTGGGACACCCGACAGATGTTTTTC CB2**afs2 | 214 | TRDV2 | TRDD3 | TRDJ1 | CACDILGDTDKLIF | TGTGCCTGTGACATACTGGGGGACACCGATAAACTCATCTTT CB2**afs2 | 200 | TRDV2 | . | TRDJ3 | CACDTWGSSWDTRQMFF | TGTGCCTGTGACACCTGGGGGAGCTCCTGGGACACCCGACAGATGT...df.head()
and running the following will return an error
ValueError: zero-size array to reduction operation maximum which has no identity.
What might be the problem and what should I check?
Thank you in adcance!
Lihua