Closed Kirchheim closed 1 month ago
When you edited the frequency file you must have saved it with the wrong format. The columns must be comma separated like this:
係,係,60613,1
your file is semi-colon separated:
係;係;60613;1
I saved your file with the comma-separated format and it works fine for me:
Cantonese_frequency_file_V2.csv
Does this new file also work for you?
Big smile, YES, the new file also works for me.
Thanks, Maik
Am 20.05.2024 um 19:18 schrieb mortii @.***>:
When you edited the frequency file you must have saved it with the wrong format. The columns must be comma separated like this:
係,係,60613,1 your file is semi-colon separated:
係;係;60613;1 I saved your file with the comma-separated format and it works fine for me: Cantonese_frequency_file_V2.csv https://github.com/mortii/anki-morphs/files/15379997/Cantonese_frequency_file_V2.csv Does this new file also work for you?
— Reply to this email directly, view it on GitHub https://github.com/mortii/anki-morphs/issues/236#issuecomment-2120869885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPHBHLSNT5GSUEATB4LZALZDIV5LAVCNFSM6AAAAABH7XVJCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQHA3DSOBYGU. You are receiving this because you authored the thread.
Hi,
I noticed you have referenced some ready made frequency files for different languages.
Maybe the attached original reply from Words.hk is useful for adding Cantonese.
Hi Maik,
Thanks for your inquiry. Are these lists what you are looking for? https://words.hk/faiman/analysis/ The frequency list is based on a relatively small corpus and contains few authors, but hopefully it's better than using the Mandarin lists for Cantonese purposes.
Internally, we also have compiled word frequencies from "LIHKG" (a Hong Kong forum), which has order of magnitudes more data, but due to the audience there, the stuff tends to be on the vulgar side :D I'll attach the data in case that helps you. (I don't think we "own" this data at all, so as far as licensing goes, words.hk disclaims any ownership and responsibility for this one.)
And once more thanks for you quick trouble shooting. Maik
Am 20.05.2024 um 21:13 schrieb Maik Braun @.***>:
Big smile, YES, the new file also works for me.
Thanks, Maik
Am 20.05.2024 um 19:18 schrieb mortii @.***>:
When you edited the frequency file you must have saved it with the wrong format. The columns must be comma separated like this:
係,係,60613,1 your file is semi-colon separated:
係;係;60613;1 I saved your file with the comma-separated format and it works fine for me: Cantonese_frequency_file_V2.csv https://github.com/mortii/anki-morphs/files/15379997/Cantonese_frequency_file_V2.csv Does this new file also work for you?
— Reply to this email directly, view it on GitHub https://github.com/mortii/anki-morphs/issues/236#issuecomment-2120869885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPHBHLSNT5GSUEATB4LZALZDIV5LAVCNFSM6AAAAABH7XVJCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQHA3DSOBYGU. You are receiving this because you authored the thread.
@Kirchheim aha, very nice! I'm not sure how well the jieba morphemizer supports Cantonese, but it's probably better than nothing. I'll add the frequency file and the link to the guide, thanks!
@Kirchheim I added a Cantonese frequency file to the guide. The one you made only contains characters (maybe that was intentional), but this one contains words.
We could maybe add a dedicated Cantonese morphemizer at some point, this one looks decent: https://pycantonese.org/index.html
(Also, here is a relevant slide deck as a potential future reference to myself, : https://www4.comp.polyu.edu.hk/~jing1li/talks/aacl2022-can-pretrain/Slides.pdf)
Let me know if you have any more bugs or suggestion!
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Cantonese_frequency_file.csv Cantonese-ff-for-anikmorphs.xlsx
Anki 24.04.1 (ccd9ca1a) (ao) Python 3.9.18 Qt 6.6.2 PyQt 6.6.1 Platform: macOS-14.5-arm64-arm-64bit
Traceback (most recent call last): File "aqt.taskman", line 142, in _on_closures_pending File "aqt.taskman", line 86, in
File "aqt.taskman", line 106, in wrapped_done
File "aqt.operations", line 252, in wrapped_done
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 845, in _on_failure
raise error
File "concurrent.futures.thread", line 58, in run
File "aqt.operations", line 242, in wrappedop
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 96, in
op=lambda : _recalc_background_op(
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 169, in _recalc_background_op
_update_cards_and_notes(am_config, modify_enabled_config_filters)
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 406, in _update_cards_and_notes
morph_priority: dict[str, int] = _get_morph_priority(am_db, config_filter)
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 625, in _get_morph_priority
morph_priority = _get_morph_frequency_file_priority(
File "/Users/maikbraun/Library/Application Support/Anki2/addons21/472573498/recalc.py", line 649, in _get_morph_frequency_file_priority
key = row[0] + row[1]
IndexError: list index out of range
===Add-ons (active)=== (add-on provided name [Add-on folder, installed at, version, is config changed]) Add tagmove card in review ['647335063', 2020-06-12T12:32, 'None', ''] Adjust Sound Volume ['2123044452', 2023-12-25T15:00, 'None', mod] Advanced Browser ['874215009', 2023-10-21T16:34, 'None', ''] Advanced Review Bottom Bar ['1136455830', 2024-02-04T19:38, 'None', mod] AnkiConnect ['2055492159', 2024-02-27T05:37, 'None', ''] AnkiMorphs ['472573498', 2024-05-18T19:57, 'None', mod] AnkiWebView Inspector ['31746032', 2023-06-27T21:26, 'None', ''] Audio Playback Controls ['312734862', 2023-04-23T01:25, 'None', ''] BetterSearch ['1052724801', 2024-03-06T17:25, 'None', ''] CC-CEDICT for Anki Chinese dictionary ['418828045', 2022-03-23T13:22, 'None', mod] Clickable Tags v20 ['1739176371', 2022-01-30T23:58, 'None', ''] Colorful Tags Hierarchical Tags ['594329229', 2022-09-15T17:06, 'None', ''] Deck name in title 21 ['699175524', 2019-06-01T03:05, 'None', ''] Edit Field During Review ['1020366288', 2022-08-29T21:38, 'None', ''] FSRS4Anki Helper ['759844606', 2024-05-18T10:51, 'None', mod] Hanzi Stats ['181243826', 2023-12-03T23:34, 'None', mod] Learning Step and Review Interval Retention ['1949865265', 2024-01-06T18:48, 'None', ''] Migaku Anki Add-on ['1846879528', 2024-02-22T23:31, 'None', mod] Minimal Theme ['867316254', 2023-07-08T07:03, 'None', ''] Opening the same window multiple time ['354407385', 2023-11-05T02:59, 'None', ''] Quick tagging 21 ['304770511', 2020-02-11T07:17, 'None', ''] ReColor ['688199788', 2024-03-03T02:45, '3.0', mod] Remove card history ['2089200096', 2023-10-19T05:15, 'None', ''] Review Heatmap ['1771074083', 2022-06-30T03:43, 'None', ''] Scale Images ['1312865748', 2023-11-02T07:45, 'None', ''] Study Time Stats ['1247171202', 2024-02-24T17:59, 'None', ''] Write on your screen ['567651868', 2024-02-04T08:24, 'None', ''] Yomichan Forvo Server ['580654285', 2023-08-30T22:53, 'None', mod] Zoom24 - Keep zoom level after reboot Fixed by Shige ['1923741581', 2024-05-02T03:23, 'None', mod] ankimorphs-chinese-jieba ['1857311956', 2024-03-25T15:52, 'None', '']
===IDs of active AnkiWeb add-ons=== 1020366288 1052724801 1136455830 1247171202 1312865748 1739176371 1771074083 181243826 1846879528 1857311956 1923741581 1949865265 2055492159 2089200096 2123044452 304770511 312734862 31746032 354407385 418828045 472573498 567651868 580654285 594329229 647335063 688199788 699175524 759844606 867316254 874215009
===Add-ons (inactive)=== (add-on provided name [Add-on folder, installed at, version, is config changed]) Hanzi Table ['768679940', 2024-02-18T01:17, 'None', ''] No Distractions Full Screen Fixed for Anki 23 by Shige ['1370336700', 2024-03-05T14:25, 'None', mod]
Describe the bug
Steps to reproduce the behavior
Expected behavior
Screenshots
My setup
Additional context