marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Tests and validators for csv_converter.py #6

Closed JasonMoho closed 3 years ago

JasonMoho commented 3 years ago

Is your feature request related to a problem? Please describe. The converter for delimited files does not have a set of tests associated with it.

Describe the solution you'd like We should add tests for each function in csv_converter.py which cover reasonable inputs and possible failure modes.

For example, for the general_parser function https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L118 we should test:

Part of this testing effort should be to add validators to the input arguments to the general_parsers ensure no unreasonable values are passed into it: e.g a dataset split of (.8, .8), or a format ("sxrd"), etc.

Describe alternatives you've considered The alternative is to leave it untested. No thanks.

Additional context While testing we should note the ways we can improve and simplify the design of the preprocessing code and create a list of changes we will want to make in a future pull request. For example, https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L213, https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L238, and https://github.com/marius-team/marius/blob/main/tools/csv_converter.py#L252 should be put into a function and called.

JasonMoho commented 3 years ago

An initial test for this file is written here https://github.com/marius-team/marius/blob/main/test/python/preprocessing/test_csv_preprocessor.py. You can use it as an example