Open kobrafarshidi opened 1 year ago
Hi @kobrafarshidi, all the answers would be related to the notebook LaTr TextVQA Training with WandB
.
LaTr_PreTraining
LaTr TextVQA Training with WandB
is the code for fine-tuning (if I understood your question correctly). And yes, it took around 6-8 hours on GPU (I don't remember it perfectly), however, I have made the code such that, you can try it on TPU with a single line of code change. I guess colab would be really a headache since it doesn't allow background execution until you have a pro version.Regards, Akarsh
Hi Mr Akrash,
I'm so thank you for taking the trouble to answer all my question. I know you are busy and thanks to took the time to respond quickly. yes all of them are related to that note book LaTr TextVQA Training with WandB
.
1) Your code is great and that is reasonable . you added that reference code
LaTr_PreTraining
but for me is vital to know how could connect between pretrain and fine tuning and I have not any experience about connection between pretrain and fine tune and use data from pretrain in finetune script because in colab you can run only one .ipynb script and they are two separate files but on the other hand, they are related. then again it seems Mr Furkan added dataset of Amazon OCR ...2) I totally understand
3,4) Thank you so much . I will definitely follow all these instructions. Many thanks for your checking .
With gratitiude Farshidi
Hi there,
Latr_for_finetuning
, if you have the weights saved, it would initialize it, else it would train it from scratch. So, I guess this would be handy.And, if there is any additional dataset, I have tried to make the function create_feature
, (here), in such a way that, if there are OCRs available, you can pass them as an argument, and rest all will be handled by the function.
Regards,
Hi again Mr Akrash,
Thank you so much. I get all your guidance and I'll do all of them.
With gratitiude
Hi Mr @uakarsh,
Couple of weeks ago I ask you how to use pretrain in training_with_wandb file I study your guidance and do that step by step but when I want to apply pretrain in training_with_wandb file in a new block I encounter some ambiguities and some question.
1) In the following line of codes training_with_wandb file you equal ocr_json_df[image_id] and json_df[image_id]. Image_id is in training_with_wandb file but it isn't in Latr_pretrain file . how to create it for equalization between ocrs and images to find ocr=image ids?
curr_img = self.json_df.iloc[idx]['image_id'] ocr_token = self.ocr_json_df[self.ocr_json_df['image_id']==curr_img]['ocr_info'].values.tolist()[0]
2) In training_with_wandb file for extract bounding box it uses some mathematics function in the following line for example rotation . Is it important to use them in pretrain block?
for entry in ocr_token: xmin, ymin, w, h, angle = entry['bounding_box']['top_left_x'], entry['bounding_box']['top_left_y'], entry['bounding_box']['width'], entry['bounding_box']['height'], entry['bounding_box']['rotation'] xmin, ymin,w, h = resize_align_bbox([xmin, ymin, w, h], 1, 1, width, height x_centre = xmin + (w/2) y_centre = ymin + (h/2) xmin, ymin = rotate([x_centre, y_centre], [xmin, ymin], angle) xmax = xmin + w ymax = ymin + h curr_bbox = [xmin, ymin, xmax, ymax] boxes.append(curr_bbox) words.append(entry['word'])
3) In pretrain file when it had been found masked_boxes, masked_tokenized_words, tokenized_words it is not last step and in the next step we have LaTr_for_pretraining and then pre_training_model and finally we have only extrancted_feat_from_t5 (it means we have not img,boxes,tokenized_words, idx like in wandb file) so how to find them after pre_training_model to use them in next step in finetuning?
4) In the article writer mentions that in pretraining step we extract feature from pdfs dataset with pretraining with T5 then we use that features in fine tune and features are boxes, word tokens. Did I understand true? (since in picture in article shows in finetune it doesn’t use feature and we use all of pictures again in fine tune). It will be very vital to know this information and I will be very grateful for your helps and spending time for my questions.
Hi there,
TextVQA
dataset code from fine-tuning and put it in pre-training and then just modify the __getitem__
code and introduce the line:_, masked_boxes, masked_tokenized_words = apply_mask_on_token_bbox(boxes, tokenized_words)
, after tokenized_words = torch.as_tensor(tokenized_words, dtype=torch.int32)
spline
and rotation
. So, I introduced the rotation and spline of the bounding box, but actually, it was of no use, since the rotation and spline deal with 3D coordinate space. Hence, unless the rotation and spline are mentioned to be in 2D space, it is okay to remove those parts.__getitem__
definition would change). If you want to extract the masked indices, you can refer to the function apply_mask_on_token_bbox
and return the indices of the masked value, I mean simply add the variable temp
into a list and return the list. Hope this helps.Regards,
Hi Mr Akarsh , Thank you so much for answering my questions and I understood some of the questions, but some of them Unfortunately, the ambiguities have not been resolved for me yet. In first question you mention that take the entire TextVQA dataset code from fine-tuning and put it in pre-training but based on the article we didn’t want to use TextVQA dataset in pre train and we should use IDL dataset in pretraining and then connect between pretrain and finetune and when I want to use IDL I don’t know how to fine equal Ids. It is my question how to do that.
The third question, Do you mean that boxes , tokenized_words ,idx are equivalent to masked_boxes , masked_tokenized_words , temp?
Gratefully,
Hi,
In the first question, I don't think you need to find anything, you can download the IDL dataset, and then take reference to the code of pretraining for extracting the OCR for the whole dataset and then mask them. Actually, if you are focusing on the pre-training part, I think, matching the fine-tuning and pre-training code would create confusion.
In the third question, what I was trying to say, is once you have extracted the features (i.e from the create_features
method, you can then use the apply_mask_on_token_bbox
for the extracted feature for masking and performing pre-training.
Hi,
I am so thankful for giving your valuable time for my questions
You are right, matching the fine-tuning and pre-training code would create confusion but it is the main point of latr research and if we want to use this research to check it or even improve it we should use both of them synchronous. I have already mentioned that this is vital to me to know.If your IDL dataset is simple What do you think about using idl_Amazon_ocr( The same link that Mr.Farkan gave in other issue) I check it to use for pretrain but it's really still confusing for me to know to equate image id and ocr Id for running line code ocr_json_df[self.ocr_json_df['image_id']== json_df.iloc[idx]['image_id']]['ocr_info']
and I don’t know what is ocr_info in this dataset (I attach a picture of this json for you. Ofcourse, I may be completely wrong)
Hi,
I am not able to open the link. But, I think the essence would be to write a function, which can read the bounding boxes, and words for a given PDF, and then pass it to the create_features
function, for extracting the bounding box. Maybe, this helps for you
Regards, Akarsh
Hi Mr @uakarsh Thank you so much took the time to respond me. Actually my problem is part of equal image_id of images and image_id of ocr tokens. I would like to know your opinion about it. Best regards
I think in that case, maybe you have to find out a way to create a CSV file, in which there is an image entry and the corresponding ocr path of that image id. But, I guess this is not the case with your dataset. So, is it possible, if you can read the file (that you mentioned in your previous reply to this thread), and access the TextDetections
part for the corresponding ID? If so, I think the problem is solved.
]n part of TextDetections
I see only id if you mean that id. I think it is number of detected text in one image for example in one image we have 4 id number in other image we have 10 id number so this id is not per image and I attach that example. we want one image-id for each image . on the other hand we have not that pdfs and we have just only json file. What do you think about it?
I really am not sure, about how to go, unless I get a few samples and then proceed. But, what I can understand is, you need to do something so that, the image_ud and its corresponding ocr from the dataset can be extracted.
Hi, Mr @uakarsh I want to run my project with jupyter notebook with GPU because colab crashed so I change my system. but when I run on gpu gtx 1080 with 8 memory it has error out of memory. And someone guide me that I should change version of pytorch and cuda and python .Can I ask you what is the best version of Pytorch, Python, Cuda for run it?
Hi,
I guess versions won't have a role to play, but here are the following things, that could help
While constructing the pytorch lightning trainer object, you can do the following:
and many other tricks. All these you can find on the pytorch lightning trainer page. This was the reason, why I use pytorch lightning
Hope, it helps
Hi, That's right. I will test them. Thank you so much for your guide. Best regards
Hi Mr @uakarsh
As you guided, I did these things according to your settings(auto batch size, precision,....), but in the end I got an out of memory error. I think parameters is very huge and my GPU has only 8 Rams so I decided to run the program on a system with 2 GPU. Now, if I want to run this program on 2 GPU, what changes should I make to theLaTr TextVQA Training with WandB 💥
code?
Based on what I study, it is enough to write in the code in the pl.trainer in parameters GPUs=2 but with these settings, I encountered error .
I would be grateful if you could show me a solution.
many thanks.
What was the error when you ran on 2 GPUs?
Hi,
that error is
To use CUDA with multiprocessing, you must use the 'spawn' start method
and when I encounter this error, I change accelarator=ddp_spawn
but in source code, this parameter is invalid and I changa strategy=ddp_spawn
but I encounter this error that this strategy is incompatible with this source....
I think, maybe this link would be helpful. Link: https://pytorch-lightning.readthedocs.io/en/1.4.0/advanced/multi_gpu.html
Maybe the ddp_spawn strategy is not applicable for the resources that you are using
Hi, Thank you so much for your responce I had read this guide, the author pays attention to three points 1) Init tensors using type_as and register_buffer 2) Make models pickleable 3) Select GPU devices, Distributed Data Parallel
1,2) I really don't know how to do it. If I'm not mistaken,Tensors in source code are img, questions, answers, tokenizers and I run with follow and I have error I think it's my wrong
a = torch.Tensor(3, 384, 500)
img = a.type_as(img)
b = torch.Tensor(256, 6)
boxes = b.type_as(boxes)
c = torch.Tensor(256)
tokenized_words = c.type_as(tokenized_words)
d = torch.Tensor(512)
question = d.type_as(question) e = torch.Tensor(512)
answer = e.type_as(answer)
f = torch.Tensor(0)
idx = f.type_as(torch.as_tensor(idx))
3) for it I test different kind of setting and most of them was incompatible but one of them it hanged in traner.fit and only show local_rank: 0 ....and I don't know the reason of it. I attach the screen of it
Hi. Thanks for great code. First of all , I am so sorry if my questions are very simple and basic. In the continue of checking your code I encounter an error and encounter some questions. I'd appreciate it if you could help me with it.