hassonlab / 247-pickling

Contains code to create pickles from raw/processed data
1 stars 9 forks source link

don't reset index on dataframes, for future merging #102

Open zkokaja opened 1 year ago

zkokaja commented 1 year ago

see https://github.com/hassonlab/247-encoding/issues/50

In tfsemb_concat, currently: pd.concat(all_df, ignore_index=True), but we don't want to ignore.

hvgazula commented 1 year ago

Replace https://github.com/hassonlab/247-pickling/blob/4588392872b7491a8a7a52cee553968ac025722e/scripts/tfsemb_main.py#L589 with df = pd.DataFrame(index=df.index)

VeritasJoker commented 1 year ago

So uh I don't think this is resolved. I am regenerating embeddings now for 798 and found out that while we are saving the emb_df into pickles, we are using to_dict with "records" here: https://github.com/hassonlab/247-pickling/blob/b7a6fcb060ecb8276b5dcb090b97e6f5b2983558/scripts/tfsemb_main.py#L32

and here: https://github.com/hassonlab/247-pickling/blob/b7a6fcb060ecb8276b5dcb090b97e6f5b2983558/scripts/tfsemb_main.py#L37

This does not seem to save index, which we need when concatenate more here

zkokaja commented 1 year ago

But merging on index works in encoding? Should we just use pd.to_pickle instead?

VeritasJoker commented 1 year ago

Yes merging on index works in encoding. I think pd.to_pickle should work?

zkokaja commented 1 year ago

Related to #153