don't reset index on dataframes, for future merging

hassonlab / 247-pickling

Contains code to create pickles from raw/processed data

1 stars 9 forks source link

Open zkokaja opened 1 year ago

zkokaja commented 1 year ago

In tfsemb_concat, currently: pd.concat(all_df, ignore_index=True), but we don't want to ignore.

hvgazula commented 1 year ago

VeritasJoker commented 1 year ago

So uh I don't think this is resolved. I am regenerating embeddings now for 798 and found out that while we are saving the emb_df into pickles, we are using to_dict with "records" here: https://github.com/hassonlab/247-pickling/blob/b7a6fcb060ecb8276b5dcb090b97e6f5b2983558/scripts/tfsemb_main.py#L32

This does not seem to save index, which we need when concatenate more here

zkokaja commented 1 year ago

But merging on index works in encoding? Should we just use pd.to_pickle instead?

VeritasJoker commented 1 year ago

Yes merging on index works in encoding. I think pd.to_pickle should work?

zkokaja commented 1 year ago

Related to #153