melbahja / got

Got: Simple golang package and CLI tool to download large files faster 🏃 than cURL and Wget!
MIT License
714 stars 45 forks source link

v0.2 #17

Closed melbahja closed 4 years ago

melbahja commented 4 years ago

Adding support for multiple file downloads and Context

Related PRs: #15 and #16

The New API:

Multiple file download

g := got.New()

// Optional
// g.Client = your custom http client

err := g.Download("http://lcoalhost/file.ext", "/path/to/save")

Multiple files with context:

g := got.NewWithContext(context.Background())

// file 1
err := g.Download("http://lcoalhost/file.ext", "/path/to/save1")

// file 2
err := g.Download("http://lcoalhost/file2.ext", "/path/to/save2")

// Download config.
err := g.Do(&got.Download{
        URL:       "http://lcoalhost/file2.ext",
        Dest:      "/path/to/save2",
        Concurrency: 10,
})

Download Single file:


  ctx := context.Background()

    dl := got.NewDownload(ctx,"http://lcoalhost/file.ext", "/path/to/save")

    // Init
    if err := dl.Init(); err != nil {
        // err
    }

    // Start download
    if err := dl.Start(); err != nil {
        // err
    }

CLI

Single file:

got https://example.com/file.mp4

Multiple files:

got https://example.com/file.mp4 https://example.com/file2.mp4

Multiple save to specific dir:

got --dir /path/to/dir https://example.com/file.mp4 https://example.com/file2.mp4

From stdin:

cat file.txt | got --dir /dir

From file:

got --bf file.txt --dir /dir

Waiting for your feedback: @suzaku @malusev998 @poldi1405 @xurwxj

CodeLieutenant commented 4 years ago

I have implemented a little bit different approach, simpler to multiple downloads, Comment

API:

dl, err := got.New(context.Background(), got.Config { })
// Check error
err = dl.Download(url, savePath)
// Check error

// Multiple downloads can be done like this

go func() { dl.Download(...) }
// Many times ...

This requires a little bit more change for the pull request #16 (Combine methods Start() and others into Downalod). There are a couple of problems related to the data races with this approach, Download.Dest and Download.Url must be removed from the structure to avoid data races. Then single file downloads and multiple file downloads will have same API, the only difference would be multiple downloads will run in their separate go routines. If you can merge #16 to this draft, this will give us simple API for the library and really good commandline application

melbahja commented 4 years ago

In this draft every file have an independent Download{} item to avoid data races

  1. You made a good point there about errgroup
  2. The user can set custom ChunkSize, if not set then ChunkSize calculated in Init
  3. Now Progress and info removed. because every file is a independent Download item.

I guess I have to remove errgroup and replaced with channels like what you did?

CodeLieutenant commented 4 years ago

Yeah, errgroup should be removed, it is slower as it waits for all goroutines to finish before handling error. For ChunkSize in commandline it's not needed for multiple downloads, it would be better just to calculate the ChunkSize

Could you make a full example for your version of the API, so we can check how it works and make more proposals

melbahja commented 4 years ago

You can check examples in: https://github.com/melbahja/got/blob/v0_2/got_test.go#L60 and https://github.com/melbahja/got/blob/v0_2/download_test.go#L260

CodeLieutenant commented 4 years ago

You can check examples in: https://github.com/melbahja/got/blob/v0_2/got_test.go#L60 and https://github.com/melbahja/got/blob/v0_2/download_test.go#L260

What about got.Config ?

Example:

d := got.New(got.Config{ })
d := got.NewWithContext(ctx, got.Config{})
melbahja commented 4 years ago

we can use Config to share options with Download items like Interval, Concurrency and ProgressFunc

CodeLieutenant commented 4 years ago

we can use Config to share options with Download items like Interval, Concurrency and ProgressFunc

Yeah, that would be great, PS. Should http.Client be exposed ?

melbahja commented 4 years ago

yes http.Client so users can set there custom http client

melbahja commented 4 years ago

I will remove errgroup from dl

CodeLieutenant commented 4 years ago

New API looks really simple, now I see, there is no need for got.Config

PS. I and my friend are making PKGBUILD for AUR

melbahja commented 4 years ago

cool 😎

CodeLieutenant commented 4 years ago

cool 😎

When 0.2 will be released?

melbahja commented 4 years ago

probably tonight

suzaku commented 4 years ago

Yeah, errgroup should be removed, it is slower as it waits for all goroutines to finish before handling error.

This is simply not true, as soon as the first error occurs, the context is canceled and all context-aware goroutines get a chance to stop.

For ChunkSize in commandline it's not needed for multiple downloads, it would be better just to calculate the ChunkSize

Could you make a full example for your version of the API, so we can check how it works and make more proposals

melbahja commented 4 years ago

@suzaku is errgroup faster than the channel approach? if you can see the code here: https://github.com/golang/sync/blob/master/errgroup/errgroup.go

I think channels faster?

melbahja commented 4 years ago

@suzaku waiting for your feedback before merge.

suzaku commented 4 years ago

@suzaku waiting for your feedback before merge.

It doesn't really matter here. When thinking about performance, we need to pinpoint the bottleneck. I don't think the bottleneck of downloading is the errgroup vs. channel choice. I think we should pick which one is simpler to use and less error-prone.

CodeLieutenant commented 4 years ago

errgroup is the bottleneck,

errgroup.Wait() Wait blocks until all function calls from the Go method have returned, then returns the first non-nil error (if any) from them.

This is unnecessary usage of resources, when network error happens or any other error, download should be stop (maybe layer down the line we can implement some kind of recovery mechanism). For now it is ok to stop download of the file when first error occurs. So errgroup and channels dont do the same work (different behaviors).

melbahja commented 4 years ago

Merged thank you for your help :heart: