kiwidude68 / calibre_plugins

All kiwidude's plugins for calibre
GNU General Public License v3.0
201 stars 33 forks source link

Count Pages - Calculating Gunning Fog index is not working #15

Closed TheCakeIsNaOH closed 1 year ago

TheCakeIsNaOH commented 1 year ago

The gunning fog index calculation is not working for the count pages plugin.

Count Pages v1.12.1 Calibre v6.12.0 Windows 10 22H2

Log:

Count Page/Word Statistics
Initialized urlfixer
do_count_statistics - book_path=C:\Users\user\AppData\Local\Temp\calibre_57i06tjb\t4hedm1v_count_pages\976.epub, pages_algorithm=0, page_count_mode=Estimate, statistics_to_run=['PageCount', 'WordCount', 'GunningFog'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=C:\Users\user\AppData\Local\Temp\calibre_57i06tjb\t4hedm1v_count_pages\976.epub
-------------------------------
Logfile for book ID 976 (Book - Author)
    Method of counting _page_count_mode=Estimate _download_sources=[]
    results= {'PageCount': 4, 'WordCount': 944}
    Found 4 pages
    Found 944 words
976
Traceback (most recent call last):
  File "calibre_plugins.count_pages.jobs", line 211, in do_statistics_for_book
  File "calibre_plugins.count_pages.statistics", line 324, in get_text_analysis
  File "calibre_plugins.count_pages.nltk_lite.textanalyzer", line 22, in __init__
ModuleNotFoundError: No module named 'copy_reg\r'
Initialized urlfixer
do_statistics_for_book:  C:\Users\user\AppData\Local\Temp\calibre_57i06tjb\t4hedm1v_count_pages\976.epub 0 Estimate [] ['PageCount', 'WordCount', 'GunningFog'] 1500 True
    Estimated accurate page count
      Lines: 120  Divs: 5  Paras: 51
      Accurate count: 3  Fast count: 4
    Page count: 4
    Word count using icu_wordcount - trying to count_words
    Word count - used count_words: 944
    Word count: 944
qaphsiel commented 1 year ago

I'm getting the same error (no module named 'copy_reg'). After a bit of digging, it seems copy_reg is a Python2 call that has been superseded by copyreg in Python3.


results= {'PageCount': 733, 'WordCount': 243505}
Found 733 pages
Found 243505 words
15454
Traceback (most recent call last):
File "calibre_plugins.count_pages.jobs", line 211, in do_statistics_for_book
iterator, text_analysis = get_text_analysis(iterator, book_path, nltk_pickle)
File "calibre_plugins.count_pages.statistics", line 324, in get_text_analysis
t = TextAnalyzer(nltk_pickle)
File "calibre_plugins.count_pages.nltk_lite.textanalyze r", line 22, in __init__
self.eng_tokenizer = pickle.loads(eng_tokenizer_pickle)
ModuleNotFoundError: No module named 'copy_reg\r'
do_statistics_for_book: /tmp/calibre_6.12.0_tmp_zs3p3yjk/zrmyv7w6_count_pages/15454.azw3 3 Estimate [] ['PageCount', 'WordCount', 'FleschReading', 'FleschGrade', 'GunningFog'] 1880 True
Page count: 733
Word count using icu_wordcount - trying to count_words
Word count - used count_words: 243505
Word count: 243505```
nicosaurus commented 1 year ago

It will also not calculate the Flesch–Kincaid scale for me.

kiwidude68 commented 1 year ago

Hi, as per the MobileRead forum thread this is a known issue due to calibre moving to Python v3. It is on my list to look into but for personal reasons that won’t happen for the next 4-6 weeks. Will definitely be addressing it once my computer equipment is operational to be able to do some coding again…

jonathanking commented 1 year ago

Hi @kiwidude68, I believe this can be fixed by running dos2unix on the english.pickle file within the plug-in. I did that and reinstalled the program via running calibre-customize -b . from within the plugin directory. This worked well for me.

qaphsiel commented 1 year ago

Hi @kiwidude68, I believe this can be fixed by running dos2unix on the english.pickle file within the plug-in. I did that and reinstalled the program via running calibre-customize -b . from within the plugin directory. This worked well for me.

To say the least, I'm deeply skeptical that running dos2unix fixed the python2-to-3 incompatibilities. Could you post the debug log of the error you were getting before you ran d2u on the pickle file?

Edit: And the post fix log too. :-)

jonathanking commented 1 year ago

It’s not a python2to3 incompatibility, in my opinion. It’s a new line character issue, caused by a pickle file that was probably created on a Windows system. Parsing the pickle file (see the original error) shows an issue with a carriage return. If you just fix the new line characters in the pickle file via dos2unix, the plug in’s Python script can load the pickle file correctly.

You can see evidence of this issue with the \r in the original error. It’s not really an import error, it’s a parsing issue.

I don’t have the original error message since I’ve already applied the fix to my computer after observing the same error as the original poster. @qaphsiel are you experiencing the same issue on your machine? If what I did was not clear, please let me know and I can write up something more detailed.

Edit: Here is a copy of the error I was experiencing before (nearly identical to the original poster):

Count Page/Word Statistics
do_count_statistics - book_path=/var/folders/3c/zbkvj1k90jj8nj178h8lqvm00000gn/C/calibre_6.12.0_tmp_7xtesxkk/oz2tzf_o_count_pages/288.fb2, pages_algorithm=0, page_count_mode=Download, statistics_to_run=['PageCount', 'FleschReading', 'FleschGrade', 'GunningFog'], custom_chars_per_page=1500, icu_wordcount=True
do_count_statistics - job started for file book_path=/var/folders/3c/zbkvj1k90jj8nj178h8lqvm00000gn/C/calibre_6.12.0_tmp_7xtesxkk/oz2tzf_o_count_pages/288.fb2
-------------------------------
Logfile for book ID 288 (Ligeia - Edgar Allan Poe)
    Method of counting _page_count_mode=Download _download_sources=[('goodreads', '419520')]
    results= {'download_source': 'goodreads', 'PageCount': 32}
    Downloaded page count from Goodreads: 32
288
Traceback (most recent call last):
  File "calibre_plugins.count_pages.jobs", line 211, in do_statistics_for_book
    iterator, text_analysis = get_text_analysis(iterator, book_path, nltk_pickle)
  File "calibre_plugins.count_pages.statistics", line 324, in get_text_analysis
    t = TextAnalyzer(nltk_pickle)
  File "calibre_plugins.count_pages.nltk_lite.textanalyzer", line 22, in __init__
    self.eng_tokenizer = pickle.loads(eng_tokenizer_pickle)
ModuleNotFoundError: No module named 'copy_reg\r'
do_statistics_for_book:  /var/folders/3c/zbkvj1k90jj8nj178h8lqvm00000gn/C/calibre_6.12.0_tmp_7xtesxkk/oz2tzf_o_count_pages/288.fb2 0 Download [('goodreads', '419520')] ['PageCount', 'FleschReading', 'FleschGrade', 'GunningFog'] 1500 True
DownloadPagesWorker::run - source_id=419520, source_name=goodreads
DownloadPagesWorker::run - PAGE_DOWNLOADS[source_name]={'URL': 'http://www.goodreads.com/book/show/%s', 'pages_xpath': '//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', 'name': 'Goodreads', 'id': 'goodreads', 'icon': 'images/goodreads.png', 'active': True, 'pages_regex': '([0-9]+) pages'}
DownloadPagesWorker::run - self.pages_regex=([0-9]+) pages
Download source book url: 'http://www.goodreads.com/book/show/419520'
_parse_page_count: start
_parse_page_count: root.__class__= HtmlElement
_parse_page_count: pages_xpath='//div[@class="FeaturedDetails"]/p[@data-testid="pagesFormat"]/text()', =pages_regex='([0-9]+) pages'
_parse_page_count: pages= ['32 pages, Paperback']
_parse_page_count: pages[0]= 32 pages, Paperback
_parse_page_count: pages_regex= ([0-9]+) pages
_parse_page_count: pages_text= 32
_parse_page_count: have pages_regex='([0-9]+) pages'
_parse_page_count: result from regex='32'
_parse_page_count: end
qaphsiel commented 1 year ago

Ahh! I see what you mean now, @jonathanking. I saw-but-did-not-see that \r after the plugin name and just leapt ahead to python migration issue.

Anyhow, I can confirm that dos2unixing english.pickle does indeed resolve the problem.

Though, I am surprised Calibre's not complaining about copy_reg, as it does not exist in Python 3.

blockloop commented 1 year ago

Hi @kiwidude68, I believe this can be fixed by running dos2unix on the english.pickle file within the plug-in. I did that and reinstalled the program via running calibre-customize -b . from within the plugin directory. This worked well for me.

worked for me

kiwidude68 commented 1 year ago

I haven't forgotten about looking into this, just not been able to work on plugins for a while and that will extend a few more weeks yet. I will be taking a look at all the open issues/PRs once that happens.

shayaknyc commented 1 year ago

Just adding my 2c here so I can be notified if/when this gets resolved as I recently noticed that it has stopped calculating these values as well. Thanks!

DiscantX commented 1 year ago

Just confirming that dos2unix fix does in fact work, I just tried it myself and all readability statistics are now calculating properly

kiwidude68 commented 1 year ago

Should be fixed in 1.13.0