keeganstreet / element-finder

Find in Files with CSS selectors

http://keegan.st/2012/06/03/find-in-files-with-css-selectors/

118 stars 8 forks source link

Plain text output format #5

Open ghost opened 11 years ago

ghost commented 11 years ago

Overview

Would be great to use Element Finder to select the content of an HTML element based on its CSS path.

Example

For example, consider the following HTML document:

<html>
<body>
  <div class="header">
  <h1>Header</h1>
  </div>
  <div class="content">
    <table>
      <tbody>
      <tr><td class="data">Tabular Content 1</td></tr>
      <tr><td class="data">Tabular Content 2</td></tr>
      </tbody>
    </table>
  </div>
  <div class="footer">
  <p>Footer</p>
  </div>
</body>
</html>

I'd like to execute the following

elfindiner -s "td.data" -t page.html

This would write the following to standard output:

Tabular Content 1
Tabular Content 2

keeganstreet commented 11 years ago

Hey @thangalin, I'm unsure of the business case for this feature. What sort of situation would you need it in?

Maybe a more flexible solution would be for me to make it easier for you to include Element Finder as a Node module inside another JavaScript file. Your custom file could then define its own output format.

ghost commented 11 years ago

Mostly for screen scraping. I've since found the W3C tools that allows me to accomplish this task: http://www.w3.org/Tools/HTML-XML-utils/

For example:

wget http://website.com/ | hxnormalize -l 240 -x 2>/dev/null | hxselect -s '\n' -c "label.black" | sort | uniq > content.txt

Contrasted with:

wget http://website.com/ | elfinder -s "label.black" | sort | uniq > content.txt

I could then easily import the elements into a database. But this probably isn't what you intended for the tool. Plus, a solution already exists, and I can easily wrap the hxnormalize and hxselect tools in a shell script to get:

wget http://website.com/ | cssgrep "label.black" | sort | uniq > content.txt

Your tool came close, but it's just not usable with the other Unix tools, which limits its usefulness for generic scraping and parsing of web pages within a shell (e.g., bash).

keeganstreet commented 11 years ago

Hey Dave,

I quite like the idea of making the output of Element Finder easier to pipe into other command line tools.

I will do some research into the standard practises for input/output of Unix tools and think about how this would best apply to Element Finder. I think it would make sense for Element Finder to have a similar interface to grep, because they are both searching through files for matches to a pattern: grep with a regular expression and Element Finder with a CSS selector.

I actually haven’t considered using Element Finder for scraping before. Its primarily designed for use during web development when you want to check which, if any, of your files contain a match for a CSS selector. But I can see scraping is a natural extension of that.

Cheers

ghost commented 11 years ago

Hi, Keegan.

If you read from standard input and write to standard output that will allow the tool to be piped with all other Unix tools. Any errors (or logging) should be written to standard error.

arigoldx commented 5 years ago

👍 on the idea of piping to other commands.

For example, vi `elfinder -s .some-class` would be reaaaaaall handy :)