create heuristics module to keep code that tries to detect the type of a column (euro, range...)
create logging module
extract functions common for "column" and "level" scripts
Checklist
[x] PR has an interpretable title with a prefix ([ENH], [BUG], [DOC], [INFRA], [MAINT])
[x] PR links to Github issue with mention Closes #XXXX
[x] Tests pass
[x] Code is properly formatted
For new features:
[x] Tests have been added
For reviewers
The main thing to look at is the content of outputs/bulk_annotation_levels.tsv to see if this is something we can work with or if more / less filtering should be done.
the current output may still include the levels of columns that are obviously continuous but could not be flagged as such.
Note that setting constant NB_LEVELS can be used as a first way to
NB_LEVELS = 10 --> 5500 lines
NB_LEVELS = 100 --> 7300 lines
Read top doc string of the script list_participants_tsv_levels.py to make sure the approach sounds good.
Check the content of the tests/data/participants.tsv that is used to define the behavior of the heuristics to make sure that each column content of tests/data/participants.tsv does match what "we" mean by the formats "euro", "range"...
Closes #8
Changes proposed in this pull request:
Checklist
[ENH]
,[BUG]
,[DOC]
,[INFRA]
,[MAINT]
)Closes #XXXX
For new features:
For reviewers
outputs/bulk_annotation_levels.tsv
to see if this is something we can work with or if more / less filtering should be done.list_participants_tsv_levels.py
to make sure the approach sounds good.tests/data/participants.tsv
that is used to define the behavior of the heuristics to make sure that each column content oftests/data/participants.tsv
does match what "we" mean by the formats "euro", "range"...