Closed chris-ha458 closed 1 year ago
each commit represents different ways to represent the same idea, but none make a difference.
If think it is not correct. The situation when multibyte_a = 0 and multibyte_b != 0 is totally correct. We just shouldn't do decision if multibyte_a == multibyte_b for example (0 and 0, 3 and 3, etc) (mess should be used).
If think it is not correct. The situation when multibyte_a = 0 and multibyte_b != 0 is totally correct. We just shouldn't do decision if multibyte_a == multibyte_b for example (0 and 0, 3 and 3, etc) (mess should be used).
i'm not sure if i fully understand what you mean. Can you show me code or maybe pseudocode(if else) what you mean?
If i Understand correctly, on your system the final accuracy results in 97.1%?
using
cargo run --release --bin performance --all-features |tail -n 50
my system shows a result of 96.8%
--> A) charset-normalizer-rs Conclusions
--> Accuracy: 96.8%
--> Total time: 642.285389ms
--> Avg time: 1.570379ms
--> 50th: 662.846µs
--> 95th: 4.530332ms
--> 99th: 11.080102ms
I'm I checking this right? Is your system showing higher than 97.0% under same code?
You could always check accuracy and speed in the output of performance action https://github.com/nickspring/charset-normalizer-rs/actions/runs/6378885659/job/17310431300?pr=25 I see 97.1% here and I have 97.1% locally. What OS do you have?
Ah you are correct.
I am using WSL2 but I plan to setup Windows and Linux (via virtualbox) workflows.
Interesting :) maybe for this platform encoding
library offers fewer encodings...
If
mess_difference <0.01
we see ifcoherence_difference > 0.02
and return partialord based on that. If not, we try to use multibyte usage difference if it is big enough.comparing with
multibyte_usage_a.abs() > f32::epsilon
is idiomatic and includes when the value is 0.0 or some value very close to it.However, it does not change the final accuracy at all.