knights-analytics / hugot

Huggingface transformer pipelines in Golang
Apache License 2.0
253 stars 11 forks source link

[Feature Request] Enhancements to model downloading #35

Closed gregfurman closed 2 weeks ago

gregfurman commented 1 month ago

Hey!

I'm aware that an external library is used for huggingface model downloading but I think the following features would be very useful for those of us wanting to use this retrieval functionality:

Not sure if a fork (like with the tokenizer) is perhaps the best approach here considering the downloader library seems more geared towards CLI usage. Else, perhaps replicating some of the code is also a viable option.

Thanks again for this awesome package!

gregfurman commented 1 month ago

FYI I've currently created a workaround for this which is EXTREMELY flaky. Let me know if the additions I mentioned in the issue body would be possible:

// setting config above...

// TODO: this is not great but we need a way to suppress the outputs to stdout and stderr
tempStdOut := os.Stdout
tempStdErr := os.Stderr

defer func() {
    os.Stdout = tempStdOut
    os.Stderr = tempStdErr
}()

if !modelDownloadOptions.Verbose {
    os.Stdout = nil
    os.Stderr = nil
}

if modelPath, err = p.session.DownloadModel(modelName, modelPath, modelDownloadOptions); err != nil {
    return nil, err
}

// code continues below...
riccardopinosio commented 1 month ago

Hi, i think for the first issue is probably best dealt with upstream in the huggingfacedownloader. For the second one (filtering for specific files to reduce download), maybe it would be good for the hugot downloader to accept a list of paths on the model repo to filter for

gregfurman commented 3 weeks ago

Hey! I tried to raise this issue upstream to no answer. The model downloading is a really helpful feature but I currently can't justify using it with all that printing. It's quite an anti-pattern to import a library and have it print to stdout in anycase.

Is there no way the team would consider forking the downloader and maintaining a version more friendly to using Hugot as a package?

RJKeevil commented 2 weeks ago

@gregfurman Done on master, for now i forked the library, if this works ok i'll submit a PR to them. Can you test to see if i missed anything?

gregfurman commented 2 weeks ago

Ahh this is brilliant! Thanks so much @RJKeevil. Will test this out a bit later today or tomorrow.

Silencing the output like that will definitely unblock me. Perhaps related, do you have any strong opinions on using a logger instead of printing to stdout?

RJKeevil commented 2 weeks ago

Great stuff! Providing a logger is on my radar, but from my experience loggers within libraries can be problematic, as people use a range of custom loggers in their projects and providing a generic interface to provide a logger is not that simple. Feel free to try a PR if you have a nice way to do this!

For now my approach is to make sure all errors return to the caller, so that a user can pass the error to their own logging, or log their own success messages when download model returns without error etc.

gregfurman commented 2 weeks ago

Works like a charm 😄 Thanks!