kathrinse / be_great

A novel approach for synthesizing tabular data using pretrained large language models
MIT License
254 stars 43 forks source link

Suggestion: Improve sampling speed #33

Closed sebffischer closed 10 months ago

sebffischer commented 1 year ago

First of all, thanks for making this library available. When trying to sample large number of samples, the sample() code becomes slower. I believe this is because of this line:

https://github.com/kathrinse/be_great/blob/251eb17aa64d7fb7bf42d120a349736f889c6cde/be_great/great_utils.py#L109

I think a speed improvement could be achieved by storing all the dataframes in a list and then concatenating the list of dataframes at the end.

unnir commented 1 year ago

Thank you for your interest in our work and the contribution, @sebffischer!

I believe this issue is similar to https://github.com/kathrinse/be_great/issues/23