Closed ArsalanAli915 closed 1 year ago
Hmm, This seems very odd as an object returned by iter()
should have next()
. Maybe try running it without any dataloader multiprocessing to see if that changes things. In the config set data_loader
->num_workers=0
I will try this and let me see if this works. Thanks for helping.
On Thu, Jan 5, 2023, 10:07 PM Brian Davis @.***> wrote:
Hmm, This seems very odd as an object returned by iter() should have next(). Maybe try running it without any dataloader multiprocessing to see if that changes things. In the config set data_loader->num_workers=0
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372488627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SKXCMDXADDO63JLOZLWQ35UDANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
I have tried by keep no of workers to 0. Then it gives error as. SingleprocessDataloader object has no attribute next. Do it necessary to fine tune on GPUs? Maybe some upgradation is done. Please check.
On Thu, Jan 5, 2023, 10:32 PM I191885 Arsalan @.***> wrote:
I will try this and let me see if this works. Thanks for helping.
On Thu, Jan 5, 2023, 10:07 PM Brian Davis @.***> wrote:
Hmm, This seems very odd as an object returned by iter() should have next(). Maybe try running it without any dataloader multiprocessing to see if that changes things. In the config set data_loader->num_workers=0
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372488627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SKXCMDXADDO63JLOZLWQ35UDANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
What are you finetuning? Is it a custom data object? Could you share the config json?
config file
On Thu, Jan 5, 2023 at 11:01 PM Brian Davis @.***> wrote:
What are you finetuning? Is it a custom data object? Could you share the config json?
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372549974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SPMYRMBLKVVZTRH3ATWQ4D6LANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
I have found on stack overflow.
dataiter = iter(dataloader) data = dataiter.next()
You need to use the following instead and it works perfectly:
dataiter = iter(dataloader) data = next(dataiter)
On Thu, Jan 5, 2023 at 11:07 PM I191885 Arsalan @.***> wrote:
config file
On Thu, Jan 5, 2023 at 11:01 PM Brian Davis @.***> wrote:
What are you finetuning? Is it a custom data object? Could you share the config json?
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372549974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SPMYRMBLKVVZTRH3ATWQ4D6LANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
Huh, that must be a pytorch update. I'm glad you found the solution
I have prepared my own dataset but I am using MyDataset class to fine tune it. as it is mentioned on example_data.
On Thu, Jan 5, 2023 at 11:09 PM I191885 Arsalan @.***> wrote:
I have found on stack overflow.
dataiter = iter(dataloader) data = dataiter.next()
You need to use the following instead and it works perfectly:
dataiter = iter(dataloader) data = next(dataiter)
On Thu, Jan 5, 2023 at 11:07 PM I191885 Arsalan @.***> wrote:
config file
On Thu, Jan 5, 2023 at 11:01 PM Brian Davis @.***> wrote:
What are you finetuning? Is it a custom data object? Could you share the config json?
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372549974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SPMYRMBLKVVZTRH3ATWQ4D6LANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
yeah. please let me know as you update. so that I can move forward with training. I would like to ask some questions. do this model work best on handwritten medical prescription? I have read the paper where it mentioned that result are not staisfactory. I am doing research. so please let me know so that I can move move ahead. thanks.
On Thu, Jan 5, 2023 at 11:12 PM Brian Davis @.***> wrote:
Huh, that must be a pytorch update. I'm glad you found the solution
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372562881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SKYPZ5KV2IPIWEVO7LWQ4FHFANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
I've fixed the .next() issue.
You'd really have to test it on your data to know. It was primarily trained with IAM handwriting data, which is pretty clean, so it may struggle if the handwriting is cramped. But if you have enough data it may learn the new handwriting style.
I would try and let you know. On Fri, Jan 6, 2023, 1:36 AM Brian Davis @.***> wrote:
I've fixed the .next() issue.
You'd really have to test it on your data to know. It was primarily trained with IAM handwriting data, which is pretty clean, so it may struggle if the handwriting is cramped. But if you have enough data it may learn the new handwriting style.
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372715660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SLCTTHRLRGE2SN3UATWQ4WFZANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
Thanks for the update. Now it works fine. I have found now problem when evaluating the model. I have attached the image containing the screenshot of the error. Please help me with this.
On Fri, Jan 6, 2023 at 8:18 AM I191885 Arsalan @.***> wrote:
I would try and let you know. On Fri, Jan 6, 2023, 1:36 AM Brian Davis @.***> wrote:
I've fixed the .next() issue.
You'd really have to test it on your data to know. It was primarily trained with IAM handwriting data, which is pretty clean, so it may struggle if the handwriting is cramped. But if you have enough data it may learn the new handwriting style.
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1372715660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SLCTTHRLRGE2SN3UATWQ4WFZANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
I don't see a screenship
ModuleNotFoundError Traceback (most recent call last)
python qa_eval.py -c saved/dessurt_mnist/model_best.pth -g 0
----> 3 evaluate(checkpoint,None, 0 if gpu else None, run=True)
4 frames /content/dessurt/qa_eval.py in main(resume, data_set_name, gpu, config, addToConfig, test, verbose, run, smaller_set, eval_full, ner_do_before) 417 418 print('getting data ready') --> 419 data_loader, valid_data_loader = getDataLoader(data_config,'train' if not test else 'test') 420 421 if test:
/content/dessurt/data_loader/data_loaders.py in getDataLoader(config, split, rank, world_size) 34 35 if data_set_name=='MultipleDataset': ---> 36 from data_sets import multiple_dataset 37 config['data_loader']['super_computer']=config['super_computer'] 38 config['validation']['super_computer']=config['super_computer']
/content/dessurt/data_sets/multiple_dataset.py in
/content/dessurt/data_sets/synth_form_dataset.py in
/content/dessurt/data_sets/gen_daemon.py in
ModuleNotFoundError: No module named 'synthetic_text_gen'
NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt.
On Tue, Jan 10, 2023 at 1:12 AM Brian Davis @.***> wrote:
I don't see a screenship
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1376249412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SNRFL2LNYR6V3QHZQLWRRWKRANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
The import error is fixed by installing https://github.com/herobd/synthetic_text_gen However, you shouldn't be evaluating with MultipleDataset (unless you're doing something unusual). It should be MyDataset, right?
Yes is correct. Thanks for commenting. I solved this problem though, another problem that I am facing I'd that I have 2334, 1646 image When I put this I found error. Whereas before that it's was working fine when I put 96,96 which is size you have mentioned in demo of Mnist Data set. Please do let me know how I can solve this. This is the error I received.
RuntimeError Traceback (most recent call last)
python train.py -c configs/cf_dessurt_mnist.json -r saved/dessurt_mnist/checkpoint-latest.pth
----> 9 train(None,config,checkpoint)
3 frames /content/dessurt/train.py in main(rank, config, resume, world_size) 83 config['supercomputer'] = '{}{}'.format(config['name'],rank) 84 ---> 85 model = eval(config['arch'])(config['model']) 86 87 if config.get('PRINT_MODEL',False):
/content/dessurt/model/dessurt.py in init(self, config) 175 use_auto_here = use_auto[len(self.swin_layers)] 176 self.swin_layers.append( nn.ModuleList( [ --> 177 SwinTransformerBlock(dim=d_im, 178 input_resolution=cur_resolution, 179 num_heads=swin_nhead[level],
/content/dessurt/model/swin_transformer.py in init(self, dim, input_resolution, num_heads, window_size, shift_size, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop, drop_path, act_layer, norm_layer, sees_docq) 257 cnt += 1 258 --> 259 mask_windows = window_partition(img_mask, self.window_size) # nW, window_size, window_size, 1 260 mask_windows = mask_windows.view(-1, self.window_size * self.window_size) 261 attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
/content/dessurt/model/swin_transformer.py in window_partition(x, window_size) 42 """ 43 B, H, W, C = x.shape ---> 44 x = x.view(B, H // window_size, window_size, W // window_size, window_size, C) 45 windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C) 46 return windows
RuntimeError: shape '[1, 17, 12, 24, 12, 1]' is invalid for input of size 59450
On Wed, Jan 11, 2023, 4:29 AM Brian Davis @.***> wrote:
The import error is fixed by installing https://github.com/herobd/synthetic_text_gen However, you shouldn't be evaluating with MultipleDataset (unless you're doing something unusual). It should be MyDataset, right?
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1378030801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SLUKAFNVXBF2FZ37K3WRXWFZANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
Sorry, this note is buried in the README: Swin implementation requires (image size / 8)%window_size==0 (8 is from 4x downsample from CNN and 2x downsample from Swin downsample)
You should just need to adjust your image height and width so that both are multiples of 8 and divided by 8 they are multiples of your window_size (12?).
I've added that bit somewhere more obvious in the README so hopefully it doesn't trip up someone else.
Thanks, I'll take a look.
On Wed, Jan 11, 2023, 11:18 PM Brian Davis @.***> wrote:
I've added that bit somewhere more obvious in the README so hopefully it doesn't trip up someone else.
— Reply to this email directly, view it on GitHub https://github.com/herobd/dessurt/issues/4#issuecomment-1379300773, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSV6SP6NUIETAJACH77CJLWR32O3ANCNFSM6AAAAAATRZXJOE . You are receiving this because you authored the thread.Message ID: @.***>
I have tried to adjust the size. now the size is [1640,2320] both length and hight are multiple of 8. Yet I found the same error.
What is the swin window size in the config?
i have resolve the size error. The following error is occurring as I have increase the input image size now. I have also tried to reduce the batch size to 1 but still I found this error on colab.
OutOfMemoryError Traceback (most recent call last)
python train.py -c configs/cf_dessurt_mnist.json -r saved/dessurt_mnist/checkpoint-latest.pth
----> 9 train(None,config,checkpoint)
12 frames /content/dessurt/train.py in main(rank, config, resume, world_size) 136 print("Begin training") 137 #warnings.filterwarnings("error") --> 138 trainer.train() 139 140
/content/dessurt/base/base_trainer.py in train(self) 352 353 if result is None: --> 354 result = self._train_iteration(self.iteration) 355 #if self.retry_count>1: 356 # print('Failed all {} times!'.format(self.retry_count))
/content/dessurt/trainer/qa_trainer.py in _train_iteration(self, iteration) 153 losses, run_log, out = self.run(thisInstance) 154 else: --> 155 losses, run_log, out = self.run(thisInstance) 156 #t#self.opt_history['full run'].append(timeit.default_timer()-tic)#t# 157
/content/dessurt/trainer/qa_trainer.py in run(self, instance, get, forward_only, valid, run) 349 pred_a, target_a, string_a, pred_mask, pred_logits = self.model(image,questions,answers,get_logits=True) 350 else: --> 351 pred_a, target_a, string_a, pred_mask = self.model(image,questions,answers) 352 353
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []
/content/dessurt/model/dessurt.py in forward(self, image, questions, answers, RUN, get_tokens, distill, get_logits) 419 420 #textual token update --> 421 qa_tokens = autoregr_layer( 422 qa_tokens, 423 torch.cat((proj_im_tokens,qa_tokens),dim=1),
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []
/content/dessurt/model/dessurt.py in forward(self, query_tokens, all_tokens, full_mask, all_padding_mask) 770 """ 771 --> 772 response = self.self_attn(query_tokens, all_tokens, all_tokens, 773 mask=full_mask, 774 key_padding_mask=all_padding_mask,
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []
/content/dessurt/model/attention.py in forward(self, query, key, value, mask, key_padding_mask) 120 # 1) Do all the linear projections in batch from d_model => h x d_k 121 query, key, value = \ --> 122 [l(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2) 123 for l, x in zip(self.linears, (query, key, value))] 124
/content/dessurt/model/attention.py in
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:
OutOfMemoryError: CUDA out of memory. Tried to allocate 184.00 MiB (GPU 0; 14.76 GiB total capacity; 13.17 GiB already allocated; 45.75 MiB free; 13.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Sorry, that's just a hardware limitation. You should be able to use a smaller image size through, as the model was trained with 1152 x 768 images.
Do there is any solution if the image size is greater than 1152 x 768?
You either need to reduce the image size or get a bigger GPU
could you please tell what metric you were using for Handwritten dataset.
also How dessurt is better than donut. since donut is end-to-end model.
I was using word error and character error. Dessurt and Donut are both end-to-end model. Dessurt was better than the initial Donut release on arXiv, but the published Donut is better than Dessurt.
/content/dessurt/train.py in main(rank, config, resume, world_size) 136 print("Begin training") 137 #warnings.filterwarnings("error") --> 138 trainer.train() 139 140
/content/dessurt/base/base_trainer.py in train(self) 352 353 if result is None: --> 354 result = self._train_iteration(self.iteration) 355 #if self.retry_count>1: 356 # print('Failed all {} times!'.format(self.retry_count))
/content/dessurt/trainer/qa_trainer.py in _train_iteration(self, iteration) 130 batch_idx = (iteration-1) % len(self.data_loader) 131 try: --> 132 thisInstance = self.data_loader_iter.next() 133 except StopIteration: 134 self.data_loader_iter = iter(self.data_loader)
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'
I am facing above error when I am fine tuneing on dessurt model.