Unicode gem couldn't be loaded on ruby 2.0 + some questions

minad commented 11 years ago

Hi,

I am working on a simple tool to synchronize BibTex files with a directory of papers (similar to Mendeley): https://github.com/minad/bibsync. I am trying to migrate from my very simple handrolled bibtex parser to your gem. However after installation an issue with the unicode gem occured.

require 'bibtex'
RuntimeError: Failed to load unicode normalizer: please gem install unicode (or active_support)

require 'unicode'
LoadError: cannot load such file -- unicode/unicode_native

However the installation of the unicode gem went fine. Then I have a few questions about your gem:

Do you convert every string to symbol internally, or is this only syntactic sugar for the query?
If a file is opened and saved again, what changes? I will try that myself if I get it running. It is just that I want to have bibsync playing together nicely with JabRef + Git for revision control. Therefore I need the tools to keep the entry order etc. I wrote my small bibtex lib in such a way that it retains the order.
Is it necessary to require all the dependencies: unicode, multi_json, latex-decode... Couldn't these be only development dependencies and only be loaded if really necessary for example for a filter.

minad commented 11 years ago

If you prefer me to ask such stuff per email or on a mailing list, please tell me...

inukshuk commented 11 years ago

We don't have a mailing list for bibtex ruby so this is fine. Give me some time to get back to you – in the meantime, however, I've heard of the unicode problem before. Strange, too, because the tests pass on travis-ci with Ruby 2.0. I'll take a look at it (by the way, this is caused by the latex-decode gem because all the unicode related stuff is there).

minad commented 11 years ago

Thanks for your quick answer!

Yeah, I have seen that travis passes and I wondered too. I admit I didn't investigate, I just wanted to drop you a note :)

In the meantime for me it would suffice to decouple all the things a bit as described before.

inukshuk commented 11 years ago

Do we convert every string to symbol internally?

I'm not sure what you mean there – are you referring to @string objects or to string-values in BibTeX entries?

minad commented 11 years ago

I have heard that symbols are not garbage collected on some VMs. Therefore I follow the pattern that I only convert strings to symbols which belong to a limited set, e.g. the entry types in this case. But the entry keys for example are not limited.

inukshuk commented 11 years ago

What changes when you open and save a file? Basically, the file is parsed, all filters etc. are applied and it is then written back to the file.

Having said that, there has been some effort put into the library to support round-trips. BibTeX has @comment objects but also ignores everything between objects; therefore it's very easy to add comments inside a BibTeX file. To support round-trips bibtex-ruby comes with a special object called 'meta_comment': this is basically any text between BibTeX objects. So if round-trips are a concern you need to make sure to enable meta_comments when parsing.

If I remember correctly the order of objects will always be retained (but the cross-reference resolver may be influenced by order – need to look that up though).

There are some advanced parser options which allow you to specify how you want whitespace to be handled. So, again, for round-trips you might want to turn the feature off that strips values.

Of course you should not enable any filters (like the latex conversion).

Finally, I think the one thing that does change is that when exporting to BibTeX indentation is always two spaces and curlies are used for quoting entry values (so if the original file used quotation marks this would change).

inukshuk commented 11 years ago

The symbol vs string issue has been raised before, yes. If I remember correctly, we are currently storing entry keys (the identifier) as strings; types and field keys (e.g., :author, :title) as symbols. If you use array access notation to query the bibliography you can pass the entry key as a symbol – this is syntactic sugar and also to make the difference between queries and direct access explicit.

If this is a concern for you, please do investigate further : )

inukshuk commented 11 years ago

Regarding the dependencies: we are currently requiring latex-decode because the filter was part of bibtex-ruby in the past and most users expected it to be there.

We're using multi_json for JSON export; I think we could make that optional – requiring it only if you actually use #to_json with a suitable error message in case it's not available. I'd certainly accept a pull request if this makes it easier for you.

As for the unicode gem, I was under the impression that we're not depending on it strictly: there are number of different combinations depending on ruby version and platform which can be used – need to consult the latex-decode gem though.

minad commented 11 years ago

Thanks for your answers. I will investigate this further and maybe come back to you with some pull requests.

inukshuk commented 11 years ago

Oops, I lied. I just noticed it's called meta_content not meta_comment – anyway, you will want to make sure those are included if you're intending to keep the files synchronized.

minad commented 11 years ago

Ok, perfect!

cruessler commented 11 years ago

I ran into the same problem when trying to run the tests under Ruby 2.0 ("Failed to load unicode normalizer: please gem install unicode (or active_support) (RuntimeError)"). Updating RubyGems and reinstalling the unicode gem solved the problem for me. It seems to depend on a RubyGems bug and can be solved by upgrading to 2.0.2 or higher (according to https://github.com/blackwinter/unicode/issues/5)

inukshuk / bibtex-ruby

Unicode gem couldn't be loaded on ruby 2.0 + some questions #63