Closed PeterJCLaw closed 8 months ago
Without pre-trained checkpoint, this package won't work. Is there any other way to download automatically without using those packages? (gdown or wget). If you have any suggestion, I would like to use it.
Thanks
That makes sense. Just to check my understanding -- we're talking about the assets included on the GitHub releases and if I'm reading the docs correctly it seems that users can provide their own too?
My expectation would be that any always-necessary data files would either be included in the distributed package (i.e: along with the source) and any which are either optional or configurable (and of a non-trivial size) would be up to the user to download separately -- perhaps as part of their build process. I realise the files are quite big, however to me that suggests even more reason to want (in production at least) to download the files as part of a build process rather than each time the package is used[^1]. This approach would also allow users to manage the files in whatever way they wanted (e.g: tools like DVC are handy for this) rather than relying on you to support lots of different sources.
I appreciate the convenience of offering an automatic download so that e.g: in local development or experimental usages the published package "Just Works", hence the suggestion of moving that functionality to a package extra. It would be reasonable to keep a developer-friendly API/entrypoint/option/whatever which does the auto-downloading if you wanted, though I'd encourage also having a production-friendly one which just checks the files are already available and errors if not.
[^1]: I realise there might be some caching involved, however when used in ephemeral environments such as Kubernetes the lifetimes of local disk caches are likely to be pretty short. In such cases, including any required artefacts in the built application would be the more common approach.
If I'm understanding correctly, you're saying that there are some way to just download large files automatically to the desired path such as transparent-background/checkpoints/base_ckpt.pth
? Sounds very handy. Thank you for your suggestion.
Essentially, yes. There are lots of tools which could be used to download resource files when deploying -- everything from plain wget
or curl
command line tools to things like DVC which support versioning of large files (essentially by putting summaries of them in your git repo and then fetching them when the user asks for them).
Just to be clear -- all these options do require a little bit more work from users, however it also gives users a lot more flexibility and predictability as they can choose how & when to download the assets as well as how they store them in their own systems.
Thanks for this useful package. Would it be possible to move the checkpoint downloading dependencies (and anything else not needed for the core behaviour) to package extras? When using libraries in production environments it's strongly preferable not to pull in unnecessary dependencies, especially where those packages potentially make network calls (e.g:
gdown
&wget
).