Open jshtok opened 1 year ago
Hello, I also busy with it! I'm curious about how can we get the images and corresponding Q-A pairs. Do you have any experience?
Well, I just took the DUE repo as reference, and brute force fixed the dataloader to fetch the relevant page.
On Wed, 6 Sep 2023, 20:04 Coobiw, @.***> wrote:
Hello, I also busy with it! I'm curious about how can we get the images and corresponding Q-A pairs. Do you have any experience?
— Reply to this email directly, view it on GitHub https://github.com/microsoft/i-Code/issues/108#issuecomment-1708777915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOBU6UTZVAHSHOWE5GQ5RDXZCUKPANCNFSM6AAAAAA4NJXH7A . You are receiving this because you authored the thread.Message ID: @.***>
Thanks for your reply! So I thought that you first generate the memmaps
. After that, you use the memmaps to build the dataloader, and save the results(image-QA pairs) into a format file just like json?
Yes, I used the DUE-baselines repo to generate the memmaps, then did the finetuning, and then the --evaluate option in the train config saves the predictions. Please notice you need to convert these predictions to another format (back in the DUE-baselines) in order to run them against the GT annotations in the DUE-evaluator repo. Then, finally, you get the performance numbers.
On Thu, Sep 7, 2023 at 10:39 AM Coobiw @.***> wrote:
Thanks for your reply! So I thought that you first generate the memmaps. After that, you use the memmaps to build the dataloader, and save the results(image-QA pairs) into a format file just like json?
— Reply to this email directly, view it on GitHub https://github.com/microsoft/i-Code/issues/108#issuecomment-1709633828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOBU6QUD5RCALMVOLQKP63XZF24BANCNFSM6AAAAAA4NJXH7A . You are receiving this because you authored the thread.Message ID: @.***>
Thank you very much!!! I want convert the memmaps into png files because I want to use the images as inputs. I want to ask that what process repo did you use? The benchmarker in UDOP repo or the original one in DUEBenchmark/baselines?
Hello, and thank you very much for the contribution of the code. While running your I have noticed that only the first page of a PDF file is loaded (as image). Indeed, in your class PregeneratedDatasetBase, the add_images() routine features the line
im = convert_from_path(im_path)[0]
While in the original DUE code the _get_page_img() routine uses the page_no field to fetch the relevant page.Can you please explain this situation? Thank you!