mozilla / fathom

A framework for extracting meaning from web pages
http://mozilla.github.io/fathom/
Mozilla Public License 2.0
1.97k stars 75 forks source link

Reduce or eliminate manual data entry in sample labeling #215

Open biancadanforth opened 5 years ago

biancadanforth commented 5 years ago

As mentioned in https://github.com/mozilla/fathom/issues/141, labeling pages currently requires labeling elements using developer tools with a Fathom extension called FathomFox. This labeling involves manual entry of the value for the data-fathom attribute for each element to be labeled. When there are possibly upwards of 150 pages to label (https://github.com/mozilla/fathom/issues/139) and multiple elements per sample to label, it's very possible for typos to occur.

Can we reduce or eliminate manual data entry in page labeling?

One idea is to have the user enter each type value only once (e.g. "image", "title", "price", ...), select the element in the developer tools' Inspector panel, and click a "Tag as ${type}" button that automatically applies the data-fathom="${type}" attribute to the opening tag.

Building on that idea, danielhertenstein suggested having the button inside a right-click context menu. Certainly there is a browser.menus WebExtension API that may make that possible.

This issue is related to mozilla/fathom#204 .

erikrose commented 5 years ago

The challenge with the context-menu thing is that you may not get the target element. If the rubric says "the innermost element containing all the important stuff", you may unwittingly click an empty containing div or similar. The as-simple-as-possible-but-not-simpler UI is probably https://github.com/mozilla/fathom-fox/issues/14.

biancadanforth commented 5 years ago

Well I suggest selecting the element in the Inspector panel, not from the page itself. I just don't think the browsers.menus API extends to nodes in the Inspector panel. If we can have UI in the Inspector panel, however, we could have such a button there instead.

erikrose commented 5 years ago

Yes, that's what I mean.