Closed efstathios-chatzikyriakidis closed 1 year ago
Hello @efstathios-chatzikyriakidis, thank you for sharing this issue. I have made the changes and also published a release with that patch. Please try installing the new version.
In the future, I recommend that you submit a pull request. This is entirely your solution, and I want people like you who find issues and implement solutions to be part of the official contributors! 🤗
Thank you @avsolatorio!
I appreciate your effort and the work you did for resolving this. Yes, in the future for any issue if I have a resolution I will send a PR.
Regarding the commit changes I see that also FrozenSeq2SeqTrainer
is replaced with Seq2SeqTrainer
. Is this something that could cause any problem or what was the intuition behind it, does it optimize anything?
Thanks!
Regarding the commit changes I see that also
FrozenSeq2SeqTrainer
is replaced withSeq2SeqTrainer
. Is this something that could cause any problem or what was the intuition behind it, does it optimize anything?
No, this should work the same. I needed to implement the FrozenSeq2SeqTrainer
before because the Seq2SeqTrainer
on the transformers library had a bug. I reported it here: https://github.com/huggingface/transformers/issues/21182.
It was already fixed, so the FrozenSeq2SeqTrainer
is no longer needed.
Hi @avsolatorio!
I have a super urgent bug ticket that I need to solve asap and I have solve it here. I wonder if you could just push it and create a new PyPI release.
It is related to the library in the
def _processes_sample()
method.Bug:
I have trained a relational model and some times the data are not so many to learn the relationship and the sampling of the relational model generates an empty
synth_sample
and as a consequence the following lineif synth_sample[col].iloc[0].startswith(col):
fails with an exception because it assumes that at least one line at iloc 0 will always exist: https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/rtf_sampler.py#L449The function
def _processes_sample()
is used also when sampling a tabular model and probably this can happen also in this tabular model sampling. Regardless the case (tabular/relational) 2-3 lines below the bug line (L449) the function already (and wisely) has some logic to check if the sampled datasynth_df
is empty and if that's the case it throws theSampleEmptyError
exception.Resolution:
So, I think we can add also a similar check above the line https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/rtf_sampler.py#L443 checking
synth_sample
dataframe instead, something like:Then in my application I can catch the specific exception:
What do you think?
Can you do the fix and build a new PyPI package?
Thanks!