Other input arguments (format, dtype, remap_ids, start_col, and num_line_skip)
Part of this testing effort should be to add validators to the input arguments to the general_parsers ensure no unreasonable values are passed into it: e.g a dataset split of (.8, .8), or a format ("sxrd"), etc.
Describe alternatives you've considered
The alternative is to leave it untested. No thanks.
Is your feature request related to a problem? Please describe. The converter for delimited files does not have a set of tests associated with it.
Describe the solution you'd like We should add tests for each function in csv_converter.py which cover reasonable inputs and possible failure modes.
For example, for the general_parser function https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L118 we should test:
Part of this testing effort should be to add validators to the input arguments to the general_parsers ensure no unreasonable values are passed into it: e.g a dataset split of (.8, .8), or a format ("sxrd"), etc.
Describe alternatives you've considered The alternative is to leave it untested. No thanks.
Additional context While testing we should note the ways we can improve and simplify the design of the preprocessing code and create a list of changes we will want to make in a future pull request. For example, https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L213, https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L238, and https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L252 should be put into a function and called.