usnistgov / grabble

2 stars 1 forks source link

custom pandas accessor #4

Open conteam opened 1 year ago

conteam commented 1 year ago

Create Levi Pandas accessor.

Input: Series with COO format, so the data as "flags" with multiindex of row & column IDs (does this need to be Series or is Dataframe ok?). This represents a Levi graph where there are "nodes" of 2 categories of information (ex. - tokens and documents). Eventually, grabble will support handling data with more than two categories, and any two can be selected for the Levi graph (ex. - tokens and date, document category and tokens, etc)

Goal: Incidence structure (ex - something like a doc-term matrix) that can be used for more calculations, including compatibility with networkx, does not use pandas's built-in sparse accessor

Tasks:

conteam commented 1 year ago

@tbsexton Just want to check that this is the correct "input" format for the accessor (multi-index pandas Series or dataframe)? Then from here we would use df.levi.foo to get out the data in various matrix formats?

Screenshot 2023-02-21 at 2 13 48 PM