setnicka / ulozto-downloader

EOL - end of life | Uloz.to quick multiple sessions downloader
MIT License
280 stars 47 forks source link

Anyone interested in a better Golang rewrite #90

Open vfosnar opened 2 years ago

vfosnar commented 2 years ago

I've been lately working on a Golang rewrite (got TensorFlow Lite captcha breaker working). In the process I discovered that the current state of this project is not so great. My current thoughts are:

  1. Make the rewrite more like a library -> split UI and the downloader, thus create a core project (library) and downloader project (CLI) with all the TUI and stuff, possibly giving a chance for GUI in the future

  2. Make a stable API for embedding the library

  3. Build the API in a way that it's capable of streaming the content (imagine real time video watching with seeking support)

  4. With a really utopic mind, in the future we could embed IPFS into the downloader (CLI/GUI) project so it queries the IPFS first and after if it fails fallback to UlozTo. Maybe building an open alternative to UlozTo. This last step would require a central server, which is really not good, but much better than proprietary UlozTo.

(FYI I have little experience so I would appreciate any feedback) In my mind I image the core library API something like this:

// Load the TensorFlow captcha solver
s, err := solver.NewTensorFlow("model.tflite")
// error handling is ignored in this example snippet just for clarity

// Start the downloader (go context, captcha solve function) -> initialize Tor
dl, err := ulozto.Start(nil, s.Solve)

// Set work to the downloader (go context, url, workers count) ->
// parse page information, call goroutine that generates
// captcha links and sends back "workers"
// (a worker is a wrapper around the download URL providing better API and error handling)
workers, size, err := dl.SetWork(nil, "https://uloz.to/file/ftJuuU9Yg51s/how-i-learned-to-stop-giving-a-shit-and-love-mindless-self-indulgence-2013-rar", 5)
fmt.Println("Size of the file in bytes:", size)

// Receive the first worker generated
worker := <-workers

// Use the received worker to download a range
// (go context could be used to cancel reads in case of video seeking, begin, end)
reader, err := worker.ReadRange(nil, 0, 42)

Once again I would really appreciate any feedback. I am a student and I have a lot of free time and I am really excited about working on this.

SpiReCZ commented 2 years ago

I think there is a need to define use cases as a first thing. For example: ulozto-streamer i made an attempt to create a wrapping REST API around the original project for my own Synology download plugin. It is always a great thing that someone wants to make the thing better, you surely can. I am not so sure it is a good idea to start another separate project in different language. Is there a reason for Golang or are you just a Golang fan? If you are familiar only with golang you can try python. I would want to write my API in Kotlin or Java, but it just does not have the support of the tensorflow lite it would need. So it is not a good idea for me to use it, i chose to go with python, since this project has the community around it. The original creators and maintainers are not present, i think the project in this repository is dead. The current implementation is calling for a rewrite because it is just a cli and i had to change some things, since everything in this implementation is written in a procedural way tied to cli. Does it even make sense to make a cli in a way that reflects this implementation (the annoying download status output)? I think people would want Web UI, Desktop UI, REST API or a simple cli for batch download. For all of the use cases to work together there would be a need for core library that would persist it's state to sqlite for example, to remember downloads and to provide the current state of downloads via interfaces. Then specific use case implementation could be built around it.

SpiReCZ commented 2 years ago

I think all that regular people want is some easy way to use the program without the need of installing python or any needed garbage. The current implementation does not even work on non-linux system since the people that made the latest changes thought only about their needs or just didn't know what they were doing. Golang project can be built for any system to be used easily and there wouldn't be a need for any custom steps for regular users. I am not so sure if even the Desktop UI is needed. Web UI is universal and it would already have REST API for external usages. Web UI could be made into a Desktop UI possibly. And covering the simple cli is the easiest thing left. Next thing is a platform support. Regular users would mostly use x64 or arm64, for NAS i can think also of x86 and arm, rpi has armhfp/arm64.

vfosnar commented 2 years ago

I am not so sure it is a good idea to start another separate project in different language

Yeah but an UlozTo downloader is really a small project and I don't see any problem with this

Is there a reason for Golang or are you just a Golang fan?

I know a lot of different languages and I want to learn Go. It has the capacity to build great things like Rust but is easy to write like Python

Does it even make sense to make a cli

Honestly I didn't think much about this. My idea was to build CLI (I spend most of my life in terminal :D) and when the code becomes more stable I would probably build WebUI around the core library (not Electron, fuck chromium)

For all of the use cases to work together there would be a need for core library that would persist it's state to sqlite for example

About this. I would much prefer having the core library only implement the low-level API I wrote about earlier in the first post. This could allow for example streaming videos from UlozTo to the WebUI without storing any data. I think creating a separate project with high-level API could be better fit than implementing this into the core library.

I think all that regular people want is some easy way to use the program without the need of installing python or any needed garbage.

This is why I think Go is much better fit than Python. I did not manage to compile TensorFlow Lite into a static library, the docs specify only how to compile to shared one, but having a folder for the program with shared library in it is much end user friendlier than having to manage Python dependencies. (On Windows something like install wizard can extract everything into Program Files and that's it)

Also as I can't think of a better solution than splitting the work into multiple "workers" if you have better idea let me know (this is the one thing in this project I am most unsure of). From my point of view this should allow seeking videos when streaming them. If user skips to any unbuffered part the context for the workers gets cancelled and they can be instantly reused for buffering the new part.

Summary

The core library should provide:

High-level library: (idk about the implementation, also this could be part of the core library)

pvorisek25 commented 2 years ago

@vfosnar Any progress?