Discussion: Seperate the scrapers from ytfzf

simonhughxyz commented 3 years ago

Currently ytfzf support youtube and peertube seems to have been added recently, but there are other video sharing websites such as odysee.

It might be a good idea to separate out the scrapers from the main script. That way anyone can provide a scraper for whichever video sharing sites they so choose

Youtube-dl had similar beginnings, and came to the same conclusion and now they have their extractors.

As long as the input to ytfzf is formatted the same from each scraper this should not be a problem, to aid in this ytfzf can provide a series of utility functions.

Advantages of separating out scrapers:

Can have as many scrapers and therefore access as many video sharing sites as we want
People can write their own custom scrapers and possibly contribute them (that is how most youtube-dl extractors are written)
Simplify maintenance, the main script will only be about searching, fzf menu etc

Disadvantages of separating our scrapers:

ytfzf would no longer be a single, self contained script
could become a maintenance problem if ytfzf has too many scrapers (you could solve this by only officially supporting a few main ones)

Euro20179 commented 3 years ago

It would be nice to have things be generic enough so you can just call functions and get what you want, which I've been on and off trying to do so it's easier to implement more scrapers. I think currently it wouldn't be that hard to have something like ytfzf --scrape="custom-scraper" foo bar baz and custom-scraper can be a function or script or something that gets called, that eventually prints some text formatted like videos_data for ytfzf to use.

However one flaw with how scrapers are currently handled, is it's very messy to have more than 1 scraper active. When implementing peertube I wanted to be able to scrape yt and pt at the same time but couldn't figure out how in a clean way.

If we are to implement multiple search engines, and more scrapers, being able to do multiple at once would be amazing, imagine searching youtube, peertube, odysee, and whatever else all at the same time.

If there were to be multiple files, they shouldn't be necessary for ytfzf to work in my opinion, for example, yt and subscriptions would be built into ytfzf and peertube, and odysee could maybe be it's own script that you can download separately.

I like the idea of ytfzf being able to parse videos_data like, from stdin or from a file so you could do cat $videos_data_file | ytfzf --raw - or something, and then it will skip the searching part, and just use whatever you gave. there is the playlists-queue branch which adds playlist, and this ability, but maybe playlists are too much and ytfzf should just provide a way to get the raw data and to parse it, then the user can handle playlists however they want.

simonhughxyz commented 3 years ago

Having scrapers run at the same time would be nice, but could also slow down ytfzf very quickly, 1 or 2 might be ok, but 10 or 20?

I think yt and peertube scraper should be separated too, but just included as default, the way every scraper is handled should be consistent, Eventually yt or peertube will change their site design, and having these separated should make it easier to write new ones rather quickly.

Also imagine a scenario where yt updates their site design but does not roll it out to all countries immediately, having the scrapers separated could mean that we have both yt scrapers, i.e. youtube-old and youtube-new for transition periods.

Euro20179 commented 3 years ago

but could also slow down ytfzf very quickly

Maybe we could fork each of them and join them back together with pipes or something.

also for peertube, we use their api so site layouts don't matter, for youtube they have a json of like every video, which seems to be used in the backend, so again site layout doesn't matter

simonhughxyz commented 3 years ago

Maybe we could fork each of them and join them back together with pipes or something.

I think the slowdown will be more to do with having to do so many network requests. Edit: Plus 10-20 threads at the same time? Not sure, you would have to test it.

If the yt json changes then it is the same problem.

Euro20179 commented 2 years ago

Edit: Plus 10-20 threads at the same time? Not sure, you would have to test it.

You can do some clever things with jobs and wait to limit the number of threads, which is exactly what i'm testing right now, i set the max threads to 50 and it's working like a charm.

Euro20179 commented 2 years ago

It might be a good idea to separate out the scrapers from the main script. That way anyone can provide a scraper for whichever video sharing sites they so choose

@simonhughxyz

2.2 is going to have an addons folder, which will contain extra menus, scrapers, and thumbnail viewers, none of which will be officially supported, but there will also be a make addons (without sudo) option to install all the addons.

Euro20179 commented 2 years ago

I'm going to close this as, the addons folder is almost exactly, if not exactly what you described. I think I am only deviating from this post, by leaving official scrapers inside the script itself.

pystardust / ytfzf

Discussion: Seperate the scrapers from ytfzf #295