marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

The test "test_remap_ids_true" fails unnecssariliy #44

Closed JasonMoho closed 3 years ago

JasonMoho commented 3 years ago

Describe the bug The test test/python/preprocessing/test_csv_preprocessor.py::TestGeneralParser::test_remap_ids_true occasionally fails with an assertion error when it should pass.

2021-06-05T17:17:06.1877640Z =================================== FAILURES ===================================
2021-06-05T17:17:06.1878720Z ____________________ TestGeneralParser.test_remap_ids_true _____________________
2021-06-05T17:17:06.1879520Z 
2021-06-05T17:17:06.1880940Z self = <test.python.preprocessing.test_csv_preprocessor.TestGeneralParser testMethod=test_remap_ids_true>
2021-06-05T17:17:06.1882130Z 
2021-06-05T17:17:06.1882890Z     def test_remap_ids_true(self):
2021-06-05T17:17:06.1883710Z         """
2021-06-05T17:17:06.1885570Z         Check if processed data has non-sequential ids if remap_ids is set
2021-06-05T17:17:06.1886600Z             to True
2021-06-05T17:17:06.1887270Z         """
2021-06-05T17:17:06.1888190Z         general_parser([str(Path(input_dir) / Path(train_file)),
2021-06-05T17:17:06.1889190Z                         str(Path(input_dir) / Path(valid_file)),
2021-06-05T17:17:06.1890150Z                         str(Path(input_dir) / Path(test_file))],
2021-06-05T17:17:06.1891080Z                        ["srd"], [output_dir], remap_ids=True)
2021-06-05T17:17:06.1891820Z     
2021-06-05T17:17:06.1892810Z         internal_node_ids = np.fromfile(str(Path(output_dir)) /
2021-06-05T17:17:06.1893870Z                                         Path("node_mapping.bin"), dtype=int)
2021-06-05T17:17:06.1895720Z         internal_rel_ids = np.fromfile(str(Path(output_dir)) /
2021-06-05T17:17:06.1896890Z                                        Path("rel_mapping.bin"), dtype=int)
2021-06-05T17:17:06.1897690Z     
2021-06-05T17:17:06.1898440Z         delta_list = []
2021-06-05T17:17:06.1899820Z         for i in range(len(internal_node_ids) - 1):
2021-06-05T17:17:06.1901380Z             delta_list.append(internal_node_ids[i+1] - internal_node_ids[i])
2021-06-05T17:17:06.1902890Z         delta_list_1 = [i - 1 for i in delta_list]
2021-06-05T17:17:06.1903910Z         delta_list_2 = [i + 1 for i in delta_list]
2021-06-05T17:17:06.1904900Z         self.assertNotEqual(sum(delta_list_1), 0)
2021-06-05T17:17:06.1905920Z         self.assertNotEqual(sum(delta_list_2), 0)
2021-06-05T17:17:06.1906960Z         self.assertNotEqual(sum(delta_list), 0)
2021-06-05T17:17:06.1907760Z     
2021-06-05T17:17:06.1908510Z         delta_list = []
2021-06-05T17:17:06.1909830Z         for i in range(len(internal_rel_ids) - 1):
2021-06-05T17:17:06.1911480Z             delta_list.append(internal_rel_ids[i+1] - internal_rel_ids[i])
2021-06-05T17:17:06.1913010Z         delta_list_1 = [i - 1 for i in delta_list]
2021-06-05T17:17:06.1913950Z         delta_list_2 = [i + 1 for i in delta_list]
2021-06-05T17:17:06.1914920Z >       self.assertNotEqual(sum(delta_list_1), 0)
2021-06-05T17:17:06.1915860Z E       AssertionError: 0 == 0
2021-06-05T17:17:06.1931790Z 
2021-06-05T17:17:06.1932910Z test/python/preprocessing/test_csv_preprocessor.py:326: AssertionError

To Reproduce Running the python test suite can cause this error to occur: https://github.com/marius-team/marius/pull/43/checks?check_run_id=2753751019

Expected behavior The test should make sure the the ids have been remapped without failure.

Environment All environments.

Additional context The issue seems to be stemming from delta_list_1 and delta_list_2. My guess is that if the IDs have been remapped to a specific set of values, that will cause this method of checking to fail.

Is there a different/easier way to check that the ids have been remapped? I think the old id values and the new id values can be just compared directly and asserted to be different to pass the test.