rcpch / national-paediatric-diabetes-audit

A django application to audit the care of children and young people with diabetes in England and Wales.
0 stars 1 forks source link

create csv sex/gender randomisation not working as intended #365

Open eatyourpeas opened 1 week ago

eatyourpeas commented 1 week ago

The csv creation function is great and generates large dummy csvs very quickly. It does seem as though sex is not being randomly allocated across the SEX_TYPE choices. Complicated by the fact this is also an optional parameter to the function.

anchit-chandran commented 4 days ago

@eatyourpeas do you know how I can recreate this? Tried using:

python manage.py create_csv \
        --pts=30 \
        --visits="CDCD DHPC ACDC CDCD" \
        --hb_target=T \
        --age_range=11_15 \
        --build \
    && python manage.py create_csv \
        --pts=30 \
        --visits="CDCCD DDCC CACC" \
        --hb_target=A \
        --age_range=16_19 \
        --build \
    && python manage.py create_csv \
        --pts=30 \
        --visits="CDC ACDC CDCD" \
        --hb_target=T \
        --age_range=0_4 \
        --build \
    && python manage.py create_csv \
       --coalesce

And then the distribution of the Stated gender column is:

df[['Stated gender']].value_counts()

Stated gender
2                431
0                347
9                326
1                296
Name: count, dtype: int64