zmwangx / caterpillar

Hardened HLS merger
MIT License
51 stars 9 forks source link

Support for variant streams, invocation from external code, progress hooks, verbosity #6

Closed Microeinstein closed 4 years ago

Microeinstein commented 4 years ago

Everything is explained in the commits message; after merge, can this version be tagged to 1.0.2 or something please? I made a bot to download my university lessons, and this edits would be very useful to replace youtube-dl and to download them in parallel!

Thank you šŸ™šŸ»

zmwangx commented 4 years ago

Thanks for the PR. The changes are too substantial for an unannounced PR, so I can't promise when I'll have time to do the review. At a glance I can get behind some changes but not all, and everything being in a single commit makes reviewing and accepting some changes extremely hard. It looks like you squashed many commits? Please separate them if possible (otherwise I might find the task dreadful enough and just do a clean room reimplementation...). Also, all code should be formatted with black. Some of the changes seem to do nothing but break the formatting (e.g., changes to argparse code), it would be nice to format with black and drop those distractions.

If the requested changes (to make things easier for me) are too time consuming for you then unfortunately it might end up taking a long time for me to get back to you... (I'm certainly not saying you're obligated to change anything. In case this sounds passive aggressive, that's not my intention, I'm just setting reasonable expectations.) Fortunately it's not hard to use your own fork in any setting (reference just in case you don't know how), so whatever you're trying to do shouldn't rely on my finding the time to review and merge the PR.

zmwangx commented 4 years ago

Some of the changes seem to do nothing but break the formatting

Oh, a little bit more on that: please don't make unnecessary changes like switching order of short and long options; this is my project so I'll switch back to my preference anyway, it'll just waste time for both of us.

The one unnecessary change I can get behind: I usually prefer qualified identifiers than bare imported identifiers (more readable without understanding context), meaning pathlib.Path not Path, but it seems I'm really using it a lot, and it's been a long time since pathlib was introduced in py34, so sure, let's just import Path alone.

Microeinstein commented 4 years ago

I can't promise when I'll have time to do the review

Don't worry

It looks like you squashed many commits? Please separate them if possible

I didn't - I know, I should have... If I try to separate the commit now, I'll probably break something šŸ˜…

Fortunately it's not hard to use your own fork in any setting

Oh, nice to know!

so whatever you're trying to do shouldn't rely on my finding the time to review and merge the PR.

Yeah, you're absolutely right; I asked a pull request mainly because I thought it was the only way to distribute the modified code with pip to final users; I don't have this urge anymore due to your suggestion, so merge only if you wish šŸ‘šŸ»


Btw I reverted some changes in particular as you asked, also the whole commit can be summarized in:

zmwangx commented 4 years ago

Thanks for making an effort.

I didn't

Ah that's what I dreaded. I can't merge all the changes in one commit even if I want to accept all of them verbatim... That would be terrible for blaming and bisecting in the future.

However, your effort isn't in vain, at least it's been made easier for me to understand the changes you want to see, so I think I'll treat them as feature requests instead and make requested changes separately.

Some thoughts:

Split main() into make_arguments() launch() handle_arguments()

Sure, could refactor out the parser. But I definitely won't use dunder unless it's absolutely necessary somehow (it never is).

Add ffmpeg_loglevel

It's nice to be able to suppress logging, but it's not that simple. Critical functionality actually relies on log level being at least info: https://github.com/zmwangx/caterpillar/blob/f7c47e6f3b86b72b7ad461e07863566da5a762af/src/caterpillar/merge.py#L43-L74 so introducing an option like this that only affects one invocation (the final merge pass) but not all others is deceptive and confusing. I'd much rather select the log level automatically based on the existing verbosity level (controlled by repeatable -q and -v, as is standard), in fact I'm already suppressing logs based on that (see https://github.com/zmwangx/caterpillar/blob/f7c47e6f3b86b72b7ad461e07863566da5a762af/src/caterpillar/merge.py#L75-L79). So I'll do that if I can't think of any reason not to.

progress_hooks

I can add that, but your reporting is too elaborate for my taste. I'll implement a simpler reporting scheme instead, other logic should be in the caller as they see fit.

Add invoke() to launch with arguments from external code

Sure, I can add an API like that.

Add download_m3u8_playlist_or_variant: use the first variant stream found, if a variant streams playlist has been found

That's a useful addition. The nondescriptive remote0.m3u8 and remote1.m3u8 are a bit weird though, so I'll likely have to massage the implementation a bit.


To summarize, I'll probably implement these changes separately, but it's hard to give you committer credit when I'm not just fixing up minor things in your commits and signing them off. This makes me feel bad since I want to give you credit. So in the future, please consider scoping your commits as well as PRs (a series of PRs for unrelated things would make it a lot easier to accept or reject some).

I'll probably be able to find some time for this either today or within the next week, then I'll cut a release.

zmwangx commented 4 years ago

I asked a pull request mainly because I thought it was the only way to distribute the modified code with pip to final users;

Btw, just to be clear, I wasn't discouraging PRs, trying to upstream changes shows respect for an OSS project and I'm grateful for that (of course as I said scoped PRs would be a lot better execution-wise). Just saying you don't have to wait unnecessarily since I can't really promise anything (the other day I finally pushed a fix to an issue I promised to fix precisely a year ago in another project so you get the idea...)

Microeinstein commented 4 years ago

That would be terrible for blaming and bisecting in the future.

I'll learn how to use git properly šŸ˜…

But I definitely won't use dunder unless it's absolutely necessary somehow

I thought using double underscores to hide those function from external code - those functions should not be called directly.

Critical functionality actually relies on log level being at least info [...] in fact I'm already suppressing logs based on that

Oh..! I didn't notice that code; during some tests I passed 'quiet': 24 but I still got ffmpeg output, I probably missed something.

New idea then: add an argument to explicitly disable ffmpeg output.

I can add that, but your reporting is too elaborate for my taste.

Uh, ok; the final result I wanted to reach is this, but without rewriting the hook code for the progress bar.

The nondescriptive remote0.m3u8 and remote1.m3u8 are a bit weird though

I couldn't find better names, following this logic:

  1. download playlist from given url (remote0.m3u8)
  2. is this a variant stream playlist?
    1. select first variant stream and download it (remote1.m3u8)
    2. put remote1.m3u8 as final playlist to consider
  3. download stream from final playlist

I'll probably implement these changes separately, but it's hard to give you committer credit

No problem šŸ‘šŸ» - feel free to do what you think best suits your needs.

just to be clear, I wasn't discouraging PRs

Yeah, I had no doubts on that; I mean, I probably would have said - to the final users of my project - to download from my fork and I would have asked for a pull request anyway.

zmwangx commented 4 years ago

I have implemented the feature requests I deem appropriate.

I think those are all the substantial requests. Let me know if you have questions or need anything else. I'll cut a release if there are no additional requests.


Unrelated Python clear-up:

I thought using double underscores to hide those function from external code - those functions should not be called directly.

No, you were thinking about name mangling, which only applies to methods, not module functions and variables, and even then only makes access difficult, not impossible. There's a very narrow use case that you'll know when you really, really need it (I was exaggerating a bit when I said "it never is", but the point is if you just want to mark something private, you shouldn't do this). https://docs.python.org/3/tutorial/classes.html#private-variables

If you just want to "hide" module functions from help and import *, then a single underscore is enough.

A simple example you can try:

mod.py:

__hidden_var = "nope, not hidden"

def __hidden_func():
    print("nope, not hidden")

class Foo:
    def __hidden_method(self):
        print("nope, still here")

main.py:

import mod

def main():
    print("mod.__hidden_var:", end="\t")
    print(mod.__hidden_var)
    print("mod.__hidden_func:", end="\t")
    mod.__hidden_func()
    foo = mod.Foo()
    print("mod.Foo.__hidden_method:", end="\t")
    try:
        foo.__hidden_method()
    except AttributeError as e:
        print(f"AttributeError: {e}")
    print("mod.Foo._Foo__hidden_method:", end="\t")
    foo._Foo__hidden_method()

if __name__ == "__main__":
    main()
$ python3 main.py
mod.__hidden_var:   nope, not hidden
mod.__hidden_func:  nope, not hidden
mod.Foo.__hidden_method:    AttributeError: 'Foo' object has no attribute '__hidden_method'
mod.Foo._Foo__hidden_method:    nope, still here
Microeinstein commented 4 years ago

Let me know if you have questions or need anything else. I'll cut a release if there are no additional requests.

I would have some questions but can't ask right now; I'm in a hurry


No, you were thinking about name mangling, which only applies to methods, not module functions and variables, and even then only makes access difficult, not impossible. [...]

Ah, I see; thank you for the clarification.

zmwangx commented 4 years ago

Iā€™m not in a hurry to cut a release, so no rush.

zmwangx commented 4 years ago

I released v1.1 while I still remember this... Closing, but feel free to ask questions.

Microeinstein commented 4 years ago