wikimedia / database-reports

Generating specific reports from the English Wikipedia
GNU Lesser General Public License v2.1
6 stars 5 forks source link

Unicode in page title #4

Open framawiki opened 8 years ago

framawiki commented 8 years ago

Hi, I can't use special chars like "Pages oubliées" in page title for frwiki. It works on other fields.

Traceback (most recent call last):
  File "main.py", line 63, in <module>
    main(sys.argv)
  File "main.py", line 17, in main
    method()
  File "main.py", line 27, in forgotten_articles
    self.rep.forgotten_articles()
  File "/data/project/framabot/test/database-reports/reports.py", line 80, in forgotten_articles
    self.publish_report( 'forgotten-articles-page-title', text )
  File "/data/project/framabot/test/database-reports/reports.py", line 414, in publish_report
    page = self.site.Pages[ reports_base_url + report_title ]
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 197, in __getitem__
    return self.get(name, None)
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 209, in get
    namespace = self.guess_namespace(name)
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 220, in guess_namespace
    if name.startswith(u'%s:' % self.site.namespaces[ns].replace(' ', '_')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 50: ordinal not in range(128)

Thanks

Niharika29 commented 8 years ago

Did your commit fix this issue or can you still reproduce it?

framawiki commented 8 years ago

No, I tried unsuccessfully. On page content it do it correctly, but it don't work only on page title. I have voluntarily used bad titles in 07489bc2134faeb780c5850c7e42deeaad2344c4.

framawiki commented 8 years ago

Temporary fix in #12 , by use HTML chars.