ValueError: zero-size array to reduction operation maximum which has no identity

Lihua1990 commented 3 years ago

Hi, I am using tcrdist3 and I encountered the error message as titled in this issue.

here's my dataframe:

df.head()

and running the following will return an error

import pandas as pd
from tcrdist.repertoire import TCRrep

tr = TCRrep(cell_df = df, 
            organism = 'human', 
            chains = ['delta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

ValueError: zero-size array to reduction operation maximum which has no identity.

What might be the problem and what should I check?

Thank you in adcance!

Lihua

kmayerb commented 3 years ago

Lihua,

This error is due to the input not being correctly formatted. So there are no distances to compute: V and J gene names must have allele number so that we can infer CDR1, CDR2, CRD2.5 (aka PMHC). This can easily be fixed in your input df

df['v_d_gene'] = df['v_d_gene'].apply(lambda  x : f"{x}*01")
df['j_d_gene'] = df['j_d_gene'].apply(lambda  x : f"{x}*01")

also make sure to only pass in relevant columns as a NA in any column of the cell df will cause you to lose that row.

tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']], 
            organism = 'human', 
            chains = ['delta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

Lihua1990 commented 3 years ago

Hi,

Thank you so much for the reply. I still have another question, you mentioned that a NA value in any column of the cell of the dataframe will cause to lose that row. In my dataframe, there are 10% of the rows that have NA value in the column 'd_d_gene', other 90% of the d_d_gene do have a defined value, such as 'TRDD3', 'TRDD2' or 'TRDD1'. What do you suggest that I deal with this 'd_d_gene' column? Should I convert all those have a defined value to df['d_d_gene'] = df['d_d_gene'].apply(lambda x : f"{x}*01")? Is there a way to also include those rows that do have a NA value in the 'd_d_gene' column?

Thank you so much!

Best, Lihua

kmayerb commented 3 years ago

D genes are not used by tcrdist3, so you should not include that column when you initialize the TCRrep instance.

You can specify only the columns you need here:

tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']], 
            organism = 'human', 
            chains = ['delta'], 
            db_file = 'alphabeta_gammadelta_db.tsv',)

Lihua1990 commented 3 years ago

D genes are not used by tcrdist3, so you should not include that column when you initialize the TCRrep instance.

You can specify only the columns you need here:
tr = TCRrep(cell_df = df[['subject','cdr3_d_aa','v_d_gene','j_d_gene','count']], 
            organism = 'human', 
            chains = ['delta'], 
            db_file = 'alphabeta_gammadelta_db.tsv',)

OK, clear now, thanks a lot!

kmayerb / tcrdist3

ValueError: zero-size array to reduction operation maximum which has no identity #57