Open hub2nature opened 4 months ago
If you mean the following KeyError: "['DATA.908134', 'DATA.908120', 'DATA.908442', 'DATA.1789883'] not in index"
, it is most likely due to the following:
In the _filter_pair
function, the not_index
list consists of strings.
However, if you look at the types of the DataFrame with drug_cell_df.dtypes
, you can see that "COSMIC_ID" has the type int64.
Accordingly, the missing values are not filtered out in line
drug_cell_df = drug_cell_df[~drug_cell_df['COSMIC_ID'].isin(not_index)]
.
A simple fix: Just make it a list of integers not_index = [908134, 1789883, 908120, 908442]
.
Are the drug encoding helper files available online?
Got one error with missing data for 4 dna ids. However the filter_pair functions calls only the drug files. Can you help?