robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.1k stars 46 forks source link

fix(windows): use `home::home_dir` instead of `HOME` env. var #9

Closed ErichDonGubler closed 8 months ago

ErichDonGubler commented 8 months ago

ocrs currently relies on checking the HOME environment variable to determine where to placed cached models. Windows doesn't use this, though, so of course, it breaks. 😅

Implement the proper cross-platform logic for getting a home directory with home::home_dir.

ocrs' current dependency tree use windows 0.48.0 because of the ring dependency via dependency on ureq. This PR unfortuantely also brings in windows 0.52.0 as a duplicate dependency. briansmith/ring#1852 upgrades ring to use windows 0.52.0, and should be followed up with to remove this dep. duplication.

ErichDonGubler commented 8 months ago

Should resolve #3 for basic use cases. I haven't explored running Windows with many command-line arguments, but this seems to be the only blocker I could find in my basic testing.

ErichDonGubler commented 8 months ago

Side note: the XDG specification for cache directories is a fantastic piece of ecosystem for the Linux/MacOS world. In Windows, however, there's an entirely different native convention for this. There's yet another crate and API that abstract over this for you, if you wish, called directories::ProjectDirs::data_local_dir.

ErichDonGubler commented 8 months ago

RE: using ProjectDirs::data_local_dir vs. using $HOME/.cache/… (which, N.B., is not actually the same as ProjectDirs::cache_dir, either, since MacOS technically has a separate convention): The XDG Base Directory Specification suggests that XDG_CACHE_HOME only be used for non-essential files:

There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME.

The XDG_DATA_HOME seems like a more appropriate fit, since without models, inference can't even run for ocrs:

There is a single base directory relative to which user-specific data files should be written. This directory is defined by the environment variable $XDG_DATA_HOME.

robertknight commented 8 months ago

Thanks for this. I'll get this merged so at least the project can build. Does the use of a leading . in .cache not cause issues on Windows? I thought that might be a problem from my vague recollection of Windows file naming rules.

The current use of ~/.cache originates from this being the directory where various Python frameworks (eg Hugging Face) store their models. I hadn't looked into it more deeply than that.

ErichDonGubler commented 8 months ago

Does the use of a leading . in .cache not cause issues on Windows? I thought that might be a problem from my vague recollection of Windows file naming rules.

Not at all! Just don't expect it to be hidden by convention, like in Unix-based systems. That's a separate file attribute in Windows file systems. :-)