kakao / buffalo

TOROS Buffalo: A fast and scalable production-ready open source project for recommender systems
Apache License 2.0
576 stars 106 forks source link

Fix W2V stream data newline error. #82

Closed yupyub closed 7 months ago

yupyub commented 7 months ago

While using the W2V model, a vulnerability arises, resulting in a memory error if the input stream data contains empty lines without characters.

Cause During the reading of stream data, if a line contains only a newline character, the num_nnz variable is incremented by 1. code

data_size = len(data) # 0
_vali_size = min(vali_n, len(data) - 1) # -1
num_nnz += (data_size - _vali_size) # +1

Later on, num_nnz is utilized as total_lines in the _sort_and_compressed_binarization() function. The values stored in the path file are pass to the records vector, and this vector is read based on the total_lines. code If the calculation of num_nnz is inflated due to the newline, it exceeds the index of the records vector, leading to references outside the bounds. Consequently, reading unexpected values triggers a segment fault or program malfunction.

Changes In instances where an empty line is inputted, it has been modified to be disregarded using the continue statement. Additionally, a typo identified during debugging has been rectified.