lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

API revisions: getBodyContent, getRawBodyContent #177

Closed lintool closed 8 years ago

lintool commented 8 years ago

@aliceranzhou @ianmilligan1 @jrwiebe I'm looking at the API... so, currently we have:

A have a few issues with this...

If we're going to refactor the API, we should do it sooner rather than later before everyone gets set in their ways...

Here's what I propose: replace the above with

So we would get rid of getRawBodyContent and everyone would have to say RemoveHTML(r.getContentString) - which is a bit longer but far more accurately descriptive.

Thoughts?

aliceranzhou commented 8 years ago

Sounds good. This is much more accurately descriptive. I'll refactor?

lintool commented 8 years ago

Let's wait for @ianmilligan1 and @jrwiebe to chime in... but yes, I'll let you do this one.

jrwiebe commented 8 years ago

I approve.

ianmilligan1 commented 8 years ago

This looks great to me - the more descriptive we can make our commands the better. Please go ahead @aliceranzhou!

aliceranzhou commented 8 years ago

:) will do!

On Wed, Nov 25, 2015 at 8:12 AM Ian Milligan notifications@github.com wrote:

This looks great to me - the more descriptive we can make our commands the better. Please go ahead @aliceranzhou https://github.com/aliceranzhou!

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/177#issuecomment-159603110.

aliceranzhou commented 8 years ago

Please review? https://github.com/lintool/warcbase/tree/refactor-api

On Wed, Nov 25, 2015 at 1:05 PM Alice Zhou alice.zhou@gmail.com wrote:

:) will do!

On Wed, Nov 25, 2015 at 8:12 AM Ian Milligan notifications@github.com wrote:

This looks great to me - the more descriptive we can make our commands the better. Please go ahead @aliceranzhou https://github.com/aliceranzhou!

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/177#issuecomment-159603110.

lintool commented 8 years ago

Can you send it in the form of a pull request so I can see the diffs easily?

aliceranzhou commented 8 years ago

yes: https://github.com/lintool/warcbase/pull/179

lintool commented 8 years ago

One minor comment. Otherwise, ship it!

As a test for getContentRaw, do you want to add a new EncodeBase64 UDF and try the trick mentioned in #178 ?

aliceranzhou commented 8 years ago

Thanks!

Sure, will do!

On Wed, Nov 25, 2015 at 5:29 PM Jimmy Lin notifications@github.com wrote:

One minor comment. Otherwise, ship it!

As a test for getContentRaw, do you want to add a new EncodeBase64 UDF and try the trick mentioned in #178 https://github.com/lintool/warcbase/issues/178 ?

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/177#issuecomment-159746008.

aliceranzhou commented 8 years ago

I've verified that the generated base64 strings decode to valid images. However, Spark Notebook doesn't seem to want to display them.

Also, an interesting discovery: the HTML injection trick only works if the HTML is within an element that will be displayed as a table. A single string will not be displayed as HTML.

base64
lintool commented 8 years ago

Odd. Browser issue? If you pop open chrome js console and examine the DOM element, does it show anything helpful? Maybe some other js is getting in the way?

I assume if you copy the same exact HTML in http://jsfiddle.net/ the image shows up?

lintool commented 8 years ago

Closing. I believe this was merged as part of 8e8f072aca5fd604974f7a1b938e137cd284466b