jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

[Feature Request] Caching after calling `.read_chat(url)` #39

Closed shivanraptor closed 1 year ago

shivanraptor commented 1 year ago

Feature you are interested in and your specific question(s): While using .read_chat(url), the ZIP file is downloaded, extracted and parsed every time the function is executed. Execution time and download time can be saved by caching the files in a local folder like ~/.cache/pycantonese/chatdata/, just like HuggingFace's .from_pretrained(model) and datasets.load_dataset() (and many other similar functions).

What you are trying to accomplish with this feature or functionality: Decrease execution time, Increase performance.

Additional context:

jacksonllee commented 1 year ago

Hi, by default .read_chat(url) ultimately calls pylangacq.Reader.from_zip, which caches the downloaded data to ~/.pylangacq/ and loads from the cached data if found. Was this not the behavior you saw on your end?

shivanraptor commented 1 year ago

Oh my bad, I didn't notice the cached folder.