Open usereight8 opened 2 months ago
Yes, it is a challenge for the current gnnwr
model. We are actively seeking a solution to this problem. We believe that the structure of the neural network may need to be adjusted to avoid calculating a distance matrix that contains all the samples, which can be a swallower of memory.
Both
gnnwr
andgtnnwr
work fine for a DataFrame of up to 10.000 rows, but for 100.000 rows or above either visual studio code crashes or a memory allocation error occurs when executing theinit_dataset
method. I have tried converting the column type from float64 to float16, however they are probably converted back to float64 during the train/test split.Is there a way either via an alteration in the code or by providing chunks of data sequentially to
init_dataset
to address this problem?