issues
search
jceresearch
/
pydit
Library of data wrangling functions that an internal auditor typically needs (for my own use and learning, if you wish to use or collaborate pls get in touch, or use at your own peril).
https://pypi.org/project/pydit-jceresearch/
MIT License
2
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Refactor
#61
jceresearch
closed
1 week ago
0
Upgrade to Python 3.13
#60
jceresearch
opened
2 weeks ago
0
47 check blanks should return summaries by default
#59
jceresearch
closed
5 months ago
0
Check sequence has an error with date (object) columns
#58
jceresearch
closed
10 months ago
1
Fuzzy merge using one or more columns in tandem plus a hardcode
#57
jceresearch
opened
1 year ago
2
Add silent mode for various functions like cleanup column
#56
jceresearch
closed
1 year ago
0
return the original dataframe when there is no duplicates
#55
jceresearch
closed
1 year ago
1
Duplicates.py needs to add a log entry about non Nan duplicates
#54
jceresearch
closed
1 year ago
0
Warning when sorting duplicates in duplicates.py
#53
jceresearch
closed
1 year ago
0
Duplicates
#52
jceresearch
closed
1 year ago
0
keyword search also generate log entry with the count of all items found not just specific columns
#51
jceresearch
opened
2 years ago
0
business hours calculator, consider the default to be end of the current year, or +365 so we can do future/estimated calculations
#50
jceresearch
opened
2 years ago
0
Improve documentation in the business hours
#49
jceresearch
opened
2 years ago
0
Logging info/debug should go to stdout and not stderr to avoid red colour in jupyter,
#48
jceresearch
closed
2 years ago
1
Check blanks should return summaries by default
#47
jceresearch
closed
5 months ago
0
groupby_text_concatenate the key returned is text even if we supplied numeric, needs to preserve the original
#46
jceresearch
closed
2 years ago
1
groupby_text_concatenate - needs an option to return unique values
#45
jceresearch
closed
2 years ago
1
check_duplicates() expand the info returned in the logging
#44
jceresearch
closed
2 years ago
1
check_duplicates() documentation doesnt include the "also return non duplicates"
#43
jceresearch
closed
2 years ago
1
check_duplicates() returning also non duplicates and indicator=True should show which ones had a duplicate that had been dropped
#42
jceresearch
closed
5 months ago
1
check_duplicates() indicator column sometimes is duplicates sometimes is duplicates_keep
#41
jceresearch
closed
2 years ago
1
Add feature to add frequency counts based on another column or columns
#40
jceresearch
closed
2 years ago
1
Add merge forcing suffixes (as per SO solution)
#39
jceresearch
closed
2 years ago
1
Save also to Pickle, change name of parameter to save_to_pickle and not start with bool_
#38
jceresearch
closed
2 years ago
0
Save xlsx needs to log what it is doing before initiating the saving as it can take ages to save
#37
jceresearch
closed
2 years ago
1
Fillna smart needs more logging to show what it did
#36
jceresearch
closed
2 years ago
1
Start logging - should also output an initial log entry
#35
jceresearch
closed
2 years ago
1
: check_blanks() Refactor to have more performance for the summary
#34
jceresearch
closed
2 years ago
0
coalesce_columns(): Check exactly the overwrite works when operation is None and document in the help
#33
jceresearch
closed
2 years ago
1
check_sequence() refactor to do better input validation and error handling and simpler flow control
#32
jceresearch
closed
2 years ago
0
add_percentile() research why we use here a different formula when grouping vs full population below
#31
jceresearch
opened
2 years ago
0
Check_duplicates - Pending test for keep=False or keep="last" #Test of keeping last occurrence only
#30
jceresearch
closed
2 years ago
1
Loading a dataframe from a pickle does not print the shape which could be useful
#29
jceresearch
closed
2 years ago
1
coalesce_columns(): find an elegant way of stripping the trailing space when using concatenate option
#28
jceresearch
closed
2 years ago
1
Add ability to provide the specific columns to fill in fillna_smart()
#27
jceresearch
closed
2 years ago
0
check_duplicates() TBC if we want to strip the blanks as an option, before looking for dupes.
#26
jceresearch
opened
2 years ago
0
add_counts: Add counts on itself, as a quick check of duplicates , TBC if useful
#25
jceresearch
closed
2 years ago
0
coalesce_columns() add options for summing values, concatenate strings, or maybe max or other operation.
#24
jceresearch
opened
2 years ago
0
Add_counts_in_each_row: add more checks for when not providing a DataFrame or no records
#23
jceresearch
opened
2 years ago
0
Add_counts_in_each_row: add option for not overwriting the column but creating a new one
#22
jceresearch
opened
2 years ago
0
Research whether returning None in singleton __init__ is the right approach, got some errors in ipython
#21
jceresearch
closed
2 years ago
1
Implement some truncation in the clean_columns_name functionif the field is too long TBC how long
#20
jceresearch
closed
2 years ago
0
Make the save to excel it work with a public method.
#19
jceresearch
closed
2 years ago
0
The blanks total column is not working
#18
jceresearch
closed
2 years ago
0
Add support for Series in the duplicate check
#17
jceresearch
closed
2 years ago
1
Develop tests and check the fullrng.issubset(unique) approach is correct
#16
jceresearch
closed
2 years ago
0
Look into pretty outputs for logging lists/tuples
#15
jceresearch
closed
2 years ago
1
Check that the folders exist, see to that this check is done every time it is updated
#14
jceresearch
closed
2 years ago
1
Add to the suite Benford and similar
#13
jceresearch
closed
2 years ago
1
Add to the sequence check a consideration for non working days
#12
jceresearch
closed
2 years ago
1
Next