kartikprabhu / mf2py

mf2 parser in python (this is an experimental fork)
Other
3 stars 2 forks source link

Explicit constructor arguments for Parser #52

Closed kylewm closed 9 years ago

kylewm commented 9 years ago

cc @tommorris

kartikprabhu commented 9 years ago

@kylewm why use "features" as a name and not just call it "html_parser" or some such to be explicit? "features" seems ambiguous

kylewm commented 9 years ago

I believe it's the name of the parameter to the BeautifulSoup constructor

On Wed, Dec 24, 2014, at 02:11 PM, Kartik Prabhu wrote:

@kylewm why use "features" as a name and not just call it "html_parser" or some such to be explicit? "features" seems ambiguous


Reply to this email directly or view it on GitHub: https://github.com/kartikprabhu/mf2py/pull/52#issuecomment-68076351

kylewm commented 9 years ago

One idea I had was to still accept **kwargs, and for any key with a prefix "bs4_", we'd strip off the prefix and pass it through to BeautifulSoup, so you'd call mf2py.Parser(doc, url, bs4_features="lxml"). What do you think?

kartikprabhu commented 9 years ago

@kylewm that ties mf2py to bs4 a particular parser. I feel we should avoid that. I am still only a little worried about "features". If bs4 decides to add more features it seems the user should know the details of bs4 feature set to use mf2py. That seems weird to me. People should be able to use mf2py without knowing the details of bs4 or any other parser we might switch to later.

kylewm commented 9 years ago

@kylewm that ties mf2py to bs4 a particular parser.

I can't quite parse this sentence. It ties mf2py to bs4? or to a particular HTML parser (html5lib, lxml)? Could you give me an idea of what you'd like to do instead? (or if you'd rather just leave it alone, that is fine too)

It makes sense to me to have some sort of pass-thru if you want to specify parameters to the underlying BeautifulSoup instance (which is totally optional). And from my perspective, having the mf2py constructor be agnostic of what those parameters actually are is a plus!

This is what I had in mind in case it was unclear:

bs4_kwargs = { k[4:], v for k, v in kwargs if k.startswith('bs4_') }
doc = BeautifulSoup(content, **bs4_kwargs);
kartikprabhu commented 9 years ago

@kylewm I mean't that if we switch from using bs4 to some other thingie (which has happened once when I forked the original mf2py from @tommorris ) then any argument of the kind bs4_* will be changed (morally). Then existing implementation will also have to rewrite code to take into account this change.

Rather, we should make mf2py as a black-box and people should not need to know that mf2py is using bs4 internally. Maybe I am over-thinking this.

kylewm commented 9 years ago

Maybe I am over-thinking this.

No I think that makes sense, and is reasonable. Opening up implementation details to the user definitely bears thorough consideration.

Closing this PR for now; I'll try again if I think of a better way to specify a parser.