Nest function NA issue - Githubissues

AlexTheWizardL commented 5 months ago

The nest function doesn`t recognize NA as a unique value and drops it

DF example

date,account,counter_account,currency,amount,base_currency_amount,vat_code,text,document
2024-05-27,1020,1000,CHF,2.0,,<values><de>MwSt. 2.6%</de><en>VAT 2.6%</en><fr>TVA 2.6%</fr><it>IVA 2.6%</it></values>,Single 1,
2024-05-27,1020,1000,CHF,20.0,,<values><de>MwSt. 2.6%</de><en>VAT 2.6%</en><fr>TVA 2.6%</fr><it>IVA 2.6%</it></values>,single2,
2024-05-28,1000,,CHF,-5.0,,Test_VAT_code,,
,1020,,,5.0,,Test_VAT_code,,
,1000,,,-5.0,,Test_VAT_code,,
,1020,,,5.0,,Test_VAT_code,,
,1020,,,-10.0,,Test_VAT_code,collective333,
,1000,,,10.0,,Test_VAT_code,collective4444,

Code usage

target_df = (pd.read_csv('tests/ledger.csv', skipinitialspace=True))
target_df = StandaloneLedger.standardize_ledger(target_df).reset_index(drop=True)
target = nest(target, columns=[col for col in target.columns if not col in ['id', 'date']], key='txn')

it will drop all rows that have Date as NA but should identify those and create a nested df for those

lasuk commented 5 months ago

Thanks for reporting.

Here's the same example for direct copy-pasting into the python console:

from io import StringIO
import pandas as pd
from cashctrl_ledger import nest
csv = """
    date,account,counter_account,currency,amount,base_currency_amount,vat_code,text,document
    2024-05-27,1020,1000,CHF,2.0,,VAT 2.6%,Single 1,
    2024-05-27,1020,1000,CHF,20.0,,VAT 2.6%,single2,
    2024-05-28,1000,,CHF,-5.0,,Test_VAT_code,,
    ,1020,,,5.0,,Test_VAT_code,,
    ,1000,,,-5.0,,Test_VAT_code,,
    ,1020,,,5.0,,Test_VAT_code,,
    ,1020,,,-10.0,,Test_VAT_code,collective333,
    ,1000,,,10.0,,Test_VAT_code,collective4444,
    """
df = pd.read_csv(StringIO(csv), skipinitialspace=True)
nested = nest(df, columns=[col for col in df.columns if col != 'date'], key='txn')

print(nested)
>>> ##          date                                                txn
>>> ## 0  2024-05-27     account  counter_account currency  amount  ...
>>> ## 1  2024-05-28     account  counter_account currency  amount  ...

lasuk commented 5 months ago

Dropping NA grouping values is the default behaviour of df.groupby(), which does most of the actual work inside nest. We need to change to df.groupby(..., dropna=False).

See https://stackoverflow.com/questions/18429491/pandas-groupby-columns-with-nan-missing-values

macxred / cashctrl_ledger

Nest function NA issue #25

DF example

Code usage