[ENH] generate summary file for OpenNeuro pheno with label levels - Githubissues

neurobagel / bulk_annotations

Retroactively annotate a large number of BIDS datasets at once

MIT License

0 stars 1 forks source link

[ENH] generate summary file for OpenNeuro pheno with label levels #9

Closed Remi-Gau closed 1 year ago

Remi-Gau commented 1 year ago

Closes #8

Changes proposed in this pull request:

refactoring
- create heuristics module to keep code that tries to detect the type of a column (euro, range...)
- create logging module
- extract functions common for "column" and "level" scripts

Checklist

[x] PR has an interpretable title with a prefix ([ENH], [BUG], [DOC], [INFRA], [MAINT])
[x] PR links to Github issue with mention Closes #XXXX
[x] Tests pass
[x] Code is properly formatted

For new features:

[x] Tests have been added

For reviewers

The main thing to look at is the content of outputs/bulk_annotation_levels.tsv to see if this is something we can work with or if more / less filtering should be done.
- the current output may still include the levels of columns that are obviously continuous but could not be flagged as such.
- Note that setting constant NB_LEVELS can be used as a first way to
  - NB_LEVELS = 10 --> 5500 lines
  - NB_LEVELS = 100 --> 7300 lines
Read top doc string of the script list_participants_tsv_levels.py to make sure the approach sounds good.
Check the content of the tests/data/participants.tsv that is used to define the behavior of the heuristics to make sure that each column content of tests/data/participants.tsv does match what "we" mean by the formats "euro", "range"...

Remi-Gau commented 1 year ago

ok made the changes and things still run

merging