microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent
Creative Commons Attribution 4.0 International
4.91k stars 375 forks source link

Dataset availability #32

Open nmstoker opened 4 weeks ago

nmstoker commented 4 weeks ago

Thank you for this impressive work -really interesting.

I had a look at the paper, blog post and here, but I cannot see any indication of where the dataset is published - is it available?

I was interested in checking the dataset because I'm seeing several cases where "OK" buttons get classified as icons wth "a button to close or cancel an action" which seems semantically incorrect: OK is more about proceeding, yet this suggests not proceeding.

It would be great to explore examples in the dataset and potentially add more to try to reduce this outcome, but obviously one needs access before that's feasible 🙂

If it's not available now but you plan to put it out, would be great if you could share a rough timeframe (eg just a few days, weeks or longer).

Many thanks!

abrichr commented 3 weeks ago

@nmstoker the dataset weights are automatically downloaded in download.py in https://github.com/microsoft/OmniParser/pull/52. This downloads the weights from https://huggingface.co/microsoft/OmniParser.

Edit: correction

aliencaocao commented 3 weeks ago

This isnt the dataset but weights.

nmstoker commented 2 weeks ago

@yadong-lu - do you have any details regarding dataset availability?

nmstoker commented 1 week ago

Hi @yadong-lu - I saw you commented a few days back on this matter here

It's great that options are being explored and I appreciate this likely needs time to work through internal processes.

Do you have a rough idea how long it might reasonably take? Eg a few more weeks or is it more like two or three months?

Would be good to keep up the momentum whilst there's plenty of attention on this exciting research, but I totally get how large companies can be 🙂

Meshwa428 commented 1 week ago

Yeah large companies often delay the release of some things if it contains proprietary data.

They might be cleaning it up for the release, the wait is good but the delay is bad 😞