microsoft / i-Code

MIT License
1.67k stars 161 forks source link

Image loading in dataloader code #108

Open jshtok opened 1 year ago

jshtok commented 1 year ago

Hello, and thank you very much for the contribution of the code. While running your I have noticed that only the first page of a PDF file is loaded (as image). Indeed, in your class PregeneratedDatasetBase, the add_images() routine features the line im = convert_from_path(im_path)[0] While in the original DUE code the _get_page_img() routine uses the page_no field to fetch the relevant page.

Can you please explain this situation? Thank you!

Coobiw commented 1 year ago

Hello, I also busy with it! I'm curious about how can we get the images and corresponding Q-A pairs. Do you have any experience?

jshtok commented 1 year ago

Well, I just took the DUE repo as reference, and brute force fixed the dataloader to fetch the relevant page.

On Wed, 6 Sep 2023, 20:04 Coobiw, @.***> wrote:

Hello, I also busy with it! I'm curious about how can we get the images and corresponding Q-A pairs. Do you have any experience?

— Reply to this email directly, view it on GitHub https://github.com/microsoft/i-Code/issues/108#issuecomment-1708777915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOBU6UTZVAHSHOWE5GQ5RDXZCUKPANCNFSM6AAAAAA4NJXH7A . You are receiving this because you authored the thread.Message ID: @.***>

Coobiw commented 1 year ago

Thanks for your reply! So I thought that you first generate the memmaps. After that, you use the memmaps to build the dataloader, and save the results(image-QA pairs) into a format file just like json?

jshtok commented 1 year ago

Yes, I used the DUE-baselines repo to generate the memmaps, then did the finetuning, and then the --evaluate option in the train config saves the predictions. Please notice you need to convert these predictions to another format (back in the DUE-baselines) in order to run them against the GT annotations in the DUE-evaluator repo. Then, finally, you get the performance numbers.

On Thu, Sep 7, 2023 at 10:39 AM Coobiw @.***> wrote:

Thanks for your reply! So I thought that you first generate the memmaps. After that, you use the memmaps to build the dataloader, and save the results(image-QA pairs) into a format file just like json?

— Reply to this email directly, view it on GitHub https://github.com/microsoft/i-Code/issues/108#issuecomment-1709633828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOBU6QUD5RCALMVOLQKP63XZF24BANCNFSM6AAAAAA4NJXH7A . You are receiving this because you authored the thread.Message ID: @.***>

Coobiw commented 1 year ago

Thank you very much!!! I want convert the memmaps into png files because I want to use the images as inputs. I want to ask that what process repo did you use? The benchmarker in UDOP repo or the original one in DUEBenchmark/baselines?