mfherbst / down-frab-videos

Download videos and lecture attachments from CCC events
GNU General Public License v3.0
11 stars 3 forks source link

Transparent fallback for BeautifulSoup lxml -> html5lib -> html.parser #9

Closed pseyfert closed 6 years ago

pseyfert commented 6 years ago

ran into problems getting down-frab-videos running. turned out the easiest fix is using html5lib as parser.

What do you think?

mfherbst commented 6 years ago

Update: This comment is obsolete. Read on to the next ;)

Also I find it really funny you are getting this error. Because I never did on any of my systems. Any clue why?

Would it maybe make sense to have html5lib as the default and not have lxml at all? Would that solve your problem?

mfherbst commented 6 years ago

Ok. I had a look at the BeatifulSoup docs. From that I would suggest we have an automatic fallback mechanism lxml, html5lib, html.parser. If lxml cannot be found, I think we should issue a warning to suggest installation of the lxml python package.

What do you think?

mfherbst commented 6 years ago

Maybe what I suggested above is a bit over the top after all. Does installing lxml solve your problem ... because if it does, than I would just bail out and issue an appropriate message, instructing to install the package.

pseyfert commented 6 years ago

It feels to me like a very hacky way to solve this.

Of cause it's hacky, I wanted to get the download started ;)

I mean I do not see the point, why this should be exposed to the config at all.

Well, that's why I was asking for feedback. My thinking was that testing and falling back once per video instead of once per installation doesn't seem elegant. But either way is fine with me.

If lxml cannot be found, I think we should issue a warning to suggest installation of the lxml python package.

That's the part I haven't understood so far about my failure, lxml is installed, not sure why BeautifulSoup doesn't manage to pick it up. (I'm undecided how dig I want to dig in understanding it.)

mfherbst commented 6 years ago

Of cause it's hacky, I wanted to get the download started ;)

I understand. Sorry for the inconvenience ;).

My thinking was that testing and falling back once per video instead of once per installation doesn't seem elegant.

It does not happen once per video. BeautifulSoup is only needed at the initialisation stage for parsing the media files and the media formats, so I think having a built-in fallback, which is walked through each time is ok. And well ... due to the oddities and irregularities in the way the VOC does the language codes and so on the code is not really in the cleanest stage either.

That's the part I haven't understood so far about my failure, lxml is installed, not sure why BeautifulSoup doesn't manage to pick it up.

Strange. Well, I guess the automatic fall back works for both of us and we do not have to dig too much ;).