Closed yagmuronay closed 3 years ago
nice catch @yagmuronay - although I am not sure that your solution will work. It is possible that the underlying data the notebook loads has changed in some way.
response = requests.get('http://oncokb.org/api/v1/genes')
oncokb_df = pd.read_json(response.content)
oncokb_df.to_csv(oncokb_out_file, sep='\t')
# Integrate copy number, oncokb gene-type, and mutation status to define status matrix
oncogenes_df = oncokb_df[oncokb_df['oncogene']]
tsg_df = oncokb_df[oncokb_df['tsg']]
# Subset copy gains by oncogenes and copy losses by tumor suppressors (tsg)
status_gain = copy_gain_df.loc[:, oncogenes_df['hugoSymbol']]
status_loss = copy_loss_df.loc[:, tsg_df['hugoSymbol']]
copy_status = pd.concat([status_gain, status_loss], axis=1)
what does tsg_df look like?
Dear Dr. Greg (@gwaygenomics),
thank you so much for your quick reply. The parameters tsg_df and oncogenes_df indeed have a column named "hugoSymbol". I found out later that the output was written as expected. Previousy, I looked into the wrong path, which was for raw data. All in all, these were only some warnings. Thank you so much for your time!
Dear Dr. Greg,
I was trying to run your process_data.py script in nbconverted scripts. This resulted in an error, which I believe was because of the missing column 'hugoSymbol' in tsg_df and also because that indexing with list with missing labels is deprecated. Please find the exact output below. The documentation says:
Could I simply use
.reindex()
instead of.loc()
here in this case? I would be very grateful for your help with this issue. Thank you.Kind regards, Yagmur