Open cbandy opened 10 years ago
wow, that is a very large gem! is there any way we can reduce this? 6mb was already too much.
I found that some of the CLD2 source files are not necessary to build the libraries. The gem is now 17 MiB and installed uses 46 MiB. If we commit to just one of libcld2.so
or libcld2_full.so
, we can reduce this further.
The unavoidable fact is that the source contains large tables of pre-computed n-grams. cld2_generated_quad0122.cc is required to build libcld2_full.so
and is 27 MiB. Gems are already compressed, so minimizing the number of these source files in the shipped gem is the only way to save bits.
If CLD2 were to release an archive/tarball, we could ship zero source files and download it before compiling the extension using something like mini_portile
.
I looked into downloading bare files from the project repository, but we either need to
svn
or wget
) orAnother option is to ship binary/pre-compiled gems. At first pass, it looks like the smaller gem would be less than 2 MiB and the larger would be less than 5 MiB.
I don't have any experience releasing a binary gem.
Any chance there has been any progress or updates with this? I'd love to help out with this if possible.
I would also like to contribute. Let's solve this issue asap. This issue p is pending for more than a year just because of size of CLD.
Here is similar implementation in JavaScript. We can take cues from that: https://github.com/dachev/node-cld
@jtoy can we reconsider this? The gem did get larger, but so did the source library. I don't think there is a clean way to avoid this and still allow anyone to use the gem.
any update on this?
@craig-day can you merge and release this ?
I'll take a look hopefully tomorrow or Monday morning at the latest.
On Oct 10, 2015, at 8:38 PM, Michael Grosser notifications@github.com wrote:
@craig-day can you merge and release this ?
— Reply to this email directly or view it on GitHub.
CLD2 project has moved to https://github.com/CLD2Owners/cld2/
@cbandy is this still ready to go? I'd like to merge and release a new major version.
It has been a long time since I looked at this.
If CLD2 were to release an archive/tarball...
I still don't see a tarball; at least not one provided by GitHub tags/releases.
I looked into downloading bare files from the project repository...
Maybe this is more reasonable now that it hosted in Git? I forget how common it is for Gem installers to have git
available.
Should we pull in any changes to CLD2 since May 2014, if any?
This appears to be the revision/commit that I imported in this PR: https://github.com/CLD2Owners/cld2/commit/d076f5eda223ac568639d6288f2e2d70d908f282
@cbandy can you update the readme link and pull in any new changes? I'm not sure if the tarball is a concern right now. I'd rather avoid a git dependency because not all places gems get installed have git (like production servers).
As far as licensing, I think you can copy the apache license from the CLD2 owners. It looks like our original license was just copied from them anyway.
@cbandy I don't think this project will be updated, I suggest you to release your code as a new cld2
gem
See #8.
Before this is merged, we should update our licensing. The library has changed to the Apache license.
The size of the bundled library has grown significantly. The source itself is over 90 MiB. The gem is now 35 MiB (up from 6 MiB) and installed it uses 93 MiB (up from 17 MiB). If CLD2 ever releases a tarball, we can stop bundling it here and shrink the installed size to 2 MiB.
There are two possible CLD2 libraries to link against:
libcld2.so
andlibcld2_full.so
. The latter can detect twice as many languages and is 4 MiB larger. I arbitrarily chose the former, smaller library in this PR. Which would you prefer to be used by default? In either case, we can also make this configurable duringgem install
.