Closed arnavk closed 10 years ago
I haven't updated all the resources yet either. I'll do it once I've efficiently separated the data.
Hey @radzinzki thanks so much for working so hard and submitting this pull request. I'll take a deep look at it later this week. Let me address your questions as well:
Hi @cameron, thanks for the link. I'll take a look and see if this saves us any space. I'm exploring other options as well. Let's see how this goes!
Also, the implementation of Range is fairly minimal with only whatever is required for the feature!
This looks quite good, @radzinzki! Here are a couple of general things I'd like you to fix across all files:
Any luck compressing the codepoint data?
I'm also having trouble running an example. Here's my code:
var fs = require('fs');
eval(fs.readFileSync('./segmentation_data.js'));
var tc = require('./en.js')
var brk = new tc.BreakIterator('en');
brk.each_sentence("Mrs. foo went to. The store.");
I get a message that says "Couldn't find property sentence_break containing property value CR", which indicates to me that the codepoint data hasn't been loaded. Any ideas?
Let me try this and find out. It works in browsers. I'll try it via Node as well. (I suppose that's how you were trying it, yes?)
regarding the tab size and trailing whitespace: Sorry about that. I'll fix it all.
Yes, I was trying to run my example in node. What's the best way to load the codepoint data?
@camertron can you try this in Node?
var fs = require('fs');
var TwitterCldr = require('./en.js')
eval(fs.readFileSync('./segmentation_data.js', utf-8));
var brk = new TwitterCldr.BreakIterator('en');
brk.each_sentence("Mrs. foo went to. The store.");
It should give the expected result:
[ 'Mrs. foo went to.', ' The store.' ]
(I'd been testing it directly in a browser so far.)
Sweet, it works! Thanks @radzinzki.
@camertron No issues. With the separation, the main implementation bundle needs to be loaded first. segmentation_data.js
overwrites some variables in TwitterCldr
Hey guys. Just stumbled on this discussion. I wrote an implementation of ICU's trie data structure that they use for looking up codepoint data a few months ago here. It includes both the builder code and runtime code. I used it in my grapheme-breaker module if you want to see an example of it in use. Just thought I'd mention this since it might save you some effort. Cheers!
@devongovett Thanks, buddy! I'll take a look at it and see if and how much of it we can use. :)
I see you've made some great progress here. Well done guys! I'm back from vacation and will try to review this PR in the next few days. If it already looks good to you, @camertron, feel free to merge it earlier.
@KL-7 @camertron I am still working on the docs. It should update them in a commit probably tomorrow or the day after. Can we hold off the merge till then?
Yes, let's wait until the docs are ready. I'd also like @KL-7 to have a chance to look at the PR before it gets merged.
After looking through most of the changes all I can say is that you've done enormous amount of work on this one, Arnav. Thank you for that. I'm not very familiar with the Ruby implementation of this feature, so it's a bit hard to grasp all of it at once. For now I left only a few minor style comments. For deeper feedback I completely trust Cameron since he wrote the original, so it should be easier for him to tell if the CoffeeScrip implementation looks good in general.
Forgot to post an update earlier, but I've added the docs.
@camertron, removed the code that isn't being used. So, I think it's good to go.
Note:
Special Note: