worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
200 stars 23 forks source link

See "IndexError: index out of range in self" when related_num parameter is specified in child model sampler #46

Closed liu305 closed 1 year ago

liu305 commented 1 year ago

I observed that with default parameters, the generated child table has much less rows than the child training data even with same number of parent table rows, so I started to put count in the parent table for the parent model to learn, and then passed this column name in the child model sampler, expecting that the generated child table would have specified number of rows for each parent record. However, what I saw was that the child sampling had been running a while, until an exception was thrown by torch indicating "index out of range" error. The count column is checked to have integer values >=1.

Also, if I just specify related_num to be a large number instead of column name, I would see the same error.

liu305 commented 1 year ago

Traceback (most recent call last): File "/project/sampling.py", line 28, in child_samples = child_model.sample( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/realtabformer/realtabformer.py", line 1284, in sample synth_df = relational_sampler.sample_relational( for _samples in self._sample_input_batch( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/realtabformer/rtf_sampler.py", line 1012, in _sample_input_batch _samples = self._generate( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/realtabformer/rtf_sampler.py", line 247, in _generate _samples = self.model.generate(generate_kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2642, in sample outputs = self( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 625, in forward decoder_outputs = self.decoder( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1076, in forward transformer_outputs = self.transformer( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 844, in forward position_embeds = self.wpe(position_ids) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/miniconda3/user-envs/liu/realtabformer-env/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self