Open hardingnj opened 3 years ago
By skipping the read_csv function, we lose the detection of nan values, so columns that are numeric are coded as objects.
read_csv
ie
import GEOparse geo = GEOparse.get_GEO("GSE112676") geo.phenotype_data["characteristics_ch1.3.age_onset"]
gives
GSM3076582 72.69 GSM3076584 66.97 GSM3076586 73.73 GSM3076588 NA GSM3076590 NA ... GSM3078502 74.88 GSM3078503 73.57 GSM3078505 71.29 GSM3078507 61.84 GSM3078510 74.49 Name: characteristics_ch1.3.age_onset, Length: 741, dtype: object
So despite being "NA" strings, they are not interpreted as being consistent with floats.
my fix is something like this:
from io import StringIO out = StringIO() pheno.to_csv(out) pheno = pd.read_csv(StringIO(out.getvalue()), index_col=0)
I can put in a quick PR, but it feels a little crude to do this, but I haven't been able to find a more elegant way.
Thanks for reporting. Let me think how to do this - maybe a PR would be good to do so we can test it.
By skipping the
read_csv
function, we lose the detection of nan values, so columns that are numeric are coded as objects.ie
gives
So despite being "NA" strings, they are not interpreted as being consistent with floats.
my fix is something like this:
I can put in a quick PR, but it feels a little crude to do this, but I haven't been able to find a more elegant way.