pr0gramista / charset_converter

Flutter platform charset converter
BSD 3-Clause "New" or "Revised" License
33 stars 22 forks source link

Detecting charset #6

Closed amake closed 4 years ago

amake commented 4 years ago

I have an application that can open user-supplied files of arbitrary encoding.

For this I need to not only decode bytes (as your plugin currently does) but also detect the encoding.

I could create a separate plugin that just detects, but then the flow would look like this:

  1. Read bytes in Dart land
  2. Pass bytes over platform channel to the detector to get the charset
  3. Again pass the bytes over platform channel to your plugin to decode, then send the string back

If the bytes are large then (2) is a big overhead, and it's quite wasteful to separate the detector and the decoder.

Would you be willing to accept a PR that adds detecting capability to your plugin?

I am looking at these libraries to do the actual detection:

So the cost to you and your plugin would be bloat for those who don't need detecting. I understand if you don't want that; then I will make a separate plugin that mostly duplicates your work, and the cost to the community is yet another set of very similar plugins that need to be evaluated (which is not necessarily a bad thing; if someone really doesn't need detecting then your plugin can remain the "slim" option in that case).

pr0gramista commented 4 years ago

Hi, thanks for reaching out!

I see your problem, so I actually did a benchmark - just sending bytes in and out of Android.

Pixel 3a:
I/flutter (16134): Benchmark with 1000 bytes
I/flutter (16134): 1599662349969754
I/flutter (16134): Took: 3788 microseconds
I/flutter (16134): Benchmark with 1000 bytes
I/flutter (16134): 1599662351552155
I/flutter (16134): Took: 1413 microseconds
I/flutter (16134): Benchmark with 1000 bytes
I/flutter (16134): 1599662352106260
I/flutter (16134): Took: 2338 microseconds
I/flutter (16134): Benchmark with 1000000 bytes
I/flutter (16134): 1599662354510188
I/flutter (16134): Took: 22854 microseconds
I/flutter (16134): Benchmark with 1000000 bytes
I/flutter (16134): 1599662355066439
I/flutter (16134): Took: 9249 microseconds
I/flutter (16134): Benchmark with 1000000 bytes
I/flutter (16134): 1599662355491612
I/flutter (16134): Took: 7008 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662357965880
I/flutter (16134): Took: 56659 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662358627834
I/m_channels_per(16134): Background young concurrent copying GC freed 538(48KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 67MB/67MB, paused 7.078ms total 18.031ms
I/flutter (16134): Took: 71550 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662359173180
I/m_channels_per(16134): Background young concurrent copying GC freed 532(48KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 67MB/67MB, paused 6.530ms total 18.086ms
I/flutter (16134): Took: 67226 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662360043655
I/m_channels_per(16134): Background young concurrent copying GC freed 496(48KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 39MB/39MB, paused 10.017ms total 23.091ms
I/flutter (16134): Took: 62630 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662360662918
I/m_channels_per(16134): Background young concurrent copying GC freed 531(48KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 39MB/39MB, paused 6.598ms total 21.381ms
I/flutter (16134): Took: 58696 microseconds
I/flutter (16134): Benchmark with 10000000 bytes
I/flutter (16134): 1599662361589423
I/flutter (16134): Took: 68720 microseconds

68ms for 10MB of data seems pretty ok. I don't think performance is an issue here, but then...

Correct me if I am wrong, but are those packages for detecting charsets big? I think I just got only 1KB difference in APK size, which seems quite unbelievable for me.

If it is so small then I don't see a reason not to include this feature.

amake commented 4 years ago

Thanks for the benchmark. That's really interesting; I guess I might be worried about nothing.

To check the size gain I did:

You can see from diffing the zipinfo output that the Android dependency only adds about 60KB compressed or 140KB uncompressed.

2c2
< Zip file size: 16554086 bytes, number of entries: 359
---
> Zip file size: 16616348 bytes, number of entries: 359
32c32
< -rw----     2.4 fat   281820 b- defN 80-000-00 00:00 classes.dex
---
> -rw----     2.4 fat   427460 b- defN 80-000-00 00:00 classes.dex
362c362
< 359 files, 37768292 bytes uncompressed, 16485120 bytes compressed:  56.4%
---
> 359 files, 37913932 bytes uncompressed, 16547346 bytes compressed:  56.4%

I find this kind of amazing, because the raw JAR itself is 224KB.

A similar analysis with the iOS library gives a size of 504KB for the uncompressed framework inside Runner.app; I have a harder time figuring out what the final size would be because lots of compression and stripping happens later and I don't understand it all.

So it looks like size-wise it's not too bad.

I am kind of torn about whether to just make my own plugin because for my own apps I still really prefer to avoid adding dependencies that I don't need, even if they are small.

amake commented 4 years ago

I ended up making a separate plugin that only focuses on auto-detection: https://pub.dev/packages/flutter_charset_detector

Thanks for your feedback!

pr0gramista commented 4 years ago

Cool! I added a link to it in README.