stanfordnlp / phrasenode

Mapping natural language commands to web elements
Other
37 stars 13 forks source link

xid creation in dataset #9

Open aburns4 opened 3 years ago

aburns4 commented 3 years ago

Hi,

I was wondering if you could explain how you obtained the 'xid' values for the annotations in the dataset. Did you perform breadth or depth first search and number elements in the DOM according to the traversal? Were there any other specifications to count the elements, such as whether they were visible or not?

Thank you!

ppasupat commented 3 years ago

Hello. The xids were generated in the order where the open tag appears (which is equivalent to depth-first search). All tags, including invisible ones, get an xid.

The dataset, which was processed by beautifulsoup, uses the following

for x in soup.body(True):     # Select all nodes
    x['data-xid'] = i
    i += 1

The demo Chrome extension also does something similar in the injectXids function.

aburns4 commented 3 years ago

Hi, okay. What about text elements that do not have children? It doesn't seem they have xids in the data files.

Thank you again.