onlinf / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Implement simple HTML table/list import #555

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Google Spreadsheets has a nice simple importHTML(url, table|list, instance#) 
function with will import the Nth instance of a table or list found on an HTML 
page.  Something like this would be great for Refine's Create Project phase 
(and perhaps as a convenience function to work with parseHtml() as well).

We can get close currently by cutting and pasting the text version of an HTML 
table into the clipboard (at least in most cases), but it would be good to get 
to it directly, I think.

Original issue reported on code.google.com by tfmorris on 20 Mar 2012 at 7:44

GoogleCodeExporter commented 9 years ago
+1 but I would also like to see this use a new GREL function to do the 
split/join automatically to createRecords() rows and not just an importer 
function.  Attached example project that shows the kind of manual labor that 
has to happen currently because we do not have a GREL function to 
createRecords(array, 
OptionalUserDefinedDelimiterStringToPerformTheSplitAndJoinThatCreatesRecrods"_sp
litme_",BooleanToKeepOrDiscardDelmiterString)

Original comment by thadguidry on 5 Apr 2012 at 4:16

Attachments:

GoogleCodeExporter commented 9 years ago
Example Project attached

Original comment by thadguidry on 5 Apr 2012 at 4:17

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by tfmorris on 9 Jul 2012 at 2:53

GoogleCodeExporter commented 9 years ago
+1  The =importHTML() formula in Google Spreadsheets is a great intro to simple 
screenscraping for many users, and the functionality would complement Google 
Refine Import well. Having loaded the URL, it might make sense to then offer an 
HTML option with a further refinement to select (at least initially) Table or 
List type along with the number of the table or list in the page (maybe even 
autodetecting the number of tables or lists available in the page?)?

Original comment by tony.hi...@gmail.com on 9 Jul 2012 at 3:14