jianglikun / DeepTTC

DeepTTC: a transformer-based model for predicting cancer drug response
13 stars 8 forks source link

Missing DNA ids #4

Open hub2nature opened 4 months ago

hub2nature commented 4 months ago

Got one error with missing data for 4 dna ids. However the filter_pair functions calls only the drug files. Can you help?

reivelrei commented 3 months ago

If you mean the following KeyError: "['DATA.908134', 'DATA.908120', 'DATA.908442', 'DATA.1789883'] not in index", it is most likely due to the following:

In the _filter_pair function, the not_index list consists of strings.

However, if you look at the types of the DataFrame with drug_cell_df.dtypes, you can see that "COSMIC_ID" has the type int64. Accordingly, the missing values are not filtered out in line drug_cell_df = drug_cell_df[~drug_cell_df['COSMIC_ID'].isin(not_index)].

A simple fix: Just make it a list of integers not_index = [908134, 1789883, 908120, 908442].

hub2nature commented 2 months ago

Are the drug encoding helper files available online?