matthew-brett / bibstuff

Fork of http://code.google.com/p/bibstuff/ with edits and sphinx extension
Other
19 stars 2 forks source link

sphinxext: global issue with UTF-8 support for files actually having non-ascii characters? #1

Open yarikoptic opened 13 years ago

yarikoptic commented 13 years ago

I thought originally it was of a failure to support 'latex' (opposite to UTF8) encoded .bib files resulting in crash:

File "/usr/lib/pymodules/python2.6/simpleparse/dispatchprocessor.py", line 120, in lines
return countlines (buffer[start or 0:end or len(buffer)])
File "/usr/lib/pymodules/python2.6/simpleparse/stt/TextTools/TextTools.py", line 467, in countlines
  return len(tag(text, linecount_table)[1])
TypeError: Low-level command (41) argument in entry 2 couldn't be converted to a string object, is a unicode

neither setting

:encoding: iso-8859-1

for biblisted nor

% Encoding: latex

in the header of .bib helped to resolve. actually converting .bib file to utf-8 (using kbibtex), removing above coding settings lead to the same failure :-/

only using matthew_brett.bib, without any UTF8 per se succeded. Adding an insulting unicode russian е instead of proper ascii e in the name of the respectful author of the first entry, did not result in the above crash unfortunately but at least obscured the authors name to become "Matthew Br." when using jasss_style

matthew-brett commented 13 years ago

Yes, sadly, simpleparse does not probably will never support unicode. I've since written two unicode supporting bibtex parsers, and I've been talking to Andrey Golovizin, the author of pybtex, who's got a long way to bibtex compatibility using pure python. So, probably the fix here would be dumping bibtools and doing a rewrite. Is it something you have urgent need of?

mih commented 13 years ago

Hey Matthew,

excellent work! I just started playing with it. I noticed that the same thing:

Exception occurred:
  File "/usr/lib/pymodules/python2.6/simpleparse/stt/TextTools/TextTools.py", line 467, in countlines
    return len(tag(text, linecount_table)[1])
TypeError: Low-level command (41) argument in entry 2 couldn't be converted to a string object, is a unicode

happens when there is a comment in the BIB file. In my case this one:

@Comment{x-kbibtex-encoding=utf-8}

After its removal I get perfect results.

Best,

Michael

yarikoptic commented 13 years ago

nice finding ;-) it seems that any kind of @comment ruins it

matthew-brett commented 13 years ago

Guys,

I'm afraid bibtools has a very fast parser that is fragile and essentially impractical to fix. I've written slower parsers that are much more like bibtex in their behavior, but dropping a new parser in would take a few days of work. You're voting for the few days I guess?

yarikoptic commented 13 years ago

What about using http://pybtex.sourceforge.net/ for all parsing -- supports UTF8 and few other exotic reference formats (YAML, BibTeXML). It lacks any formatting output for ReST ATM though, but seems to be quite nice and somewhat active project

yarikoptic commented 13 years ago

bloody buttons -- how to reopen it? I clicked 'Comment & Close' by mistake ;)

matthew-brett commented 13 years ago

I think 'Actions - Open' opens it again. I've been talking to the pybtex guy - Andrey Golovizin - result above. He tried one of my new parsers and then wrote his own in rapid order that is indeed reasonably fast and good a running through errors. The problem is that pybtex has two modes. One is 'bibtex mode' - and for that Andrey uses the bibtex .bst files and a parser for the bst language. That mode only outputs latex - because that's what the bst files output. Then there's python mode. Python mode outputs html and latex, but only has a single 'unsrt' style, which is still incomplete - for example it doesn't deal with conference papers yet as I remember, and is more fragile (requires entries in the citation that bibtex will allow to be empty). So, it would be some (useful) work to make a fairly useful rst output from pybtex.

yarikoptic commented 9 years ago

Hi @matthew-brett ,

I wondered if you had a chance to dig into this one again? thought to make use of bibstuff sphinx extension again but forgot about this little show stopper. Cheers!

matthew-brett commented 9 years ago

Sorry - no - I hadn't - it seemed hopeless.

Have you tried sphinxcontrib-bibtex? I was thinking of switching to that (but it may still lack the functionality to output a given list of references, specified in the bibliography).

https://github.com/mcmtroffaes/sphinxcontrib-bibtex

matthew-brett commented 9 years ago

Issue here: https://github.com/mcmtroffaes/sphinxcontrib-bibtex/issues/54

On 5/27/15, Matthew Brett matthew.brett@gmail.com wrote:

Sorry - no - I hadn't - it seemed hopeless.

Have you tried sphinxcontrib-bibtex? I was thinking of switching to that (but it may still lack the functionality to output a given list of references, specified in the bibliography).

https://github.com/mcmtroffaes/sphinxcontrib-bibtex