nayuki / QR-Code-generator

High-quality QR Code generator library in Java, TypeScript/JavaScript, Python, Rust, C++, C.
https://www.nayuki.io/page/qr-code-generator-library
5.2k stars 1.11k forks source link

Questions and sugestion about the rust port #204

Closed dominikwilkowski closed 3 weeks ago

dominikwilkowski commented 3 weeks ago

Hey folks. Great to have a QR code library in rust. Love it. A question thought: Why do you use i32 everywhere when I THINK it should only ever be unsigned (u32)? Like you wouldn't need things like assert!(border >= 0, "Border must be non-negative"); if using unsigned integer types. But I could be missing something. I'm happy to see if I can whip up a PR if that helps but figured I ask here first.

I need the crate for generating SVGs so after looking at the example with the to_svg_string and because I'm a bit of an SVG nerd I had to re-write it to something that produces SVGs that are about 40something % smaller in size. It also avoids the issue with the new lines being unix or include a carriage return for windows. Perhaps it's useful? If so I'm happy to create a PR.

Code and brief explanation here: https://gist.github.com/dominikwilkowski/9e95f7e2986e0129ef6fba94e94de525

nayuki commented 3 weeks ago

Why do you use i32 everywhere when I THINK it should only ever be unsigned (u32)?

https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java

A wider signed type can always represent narrower unsigned types. For example, int32 can represent all uint31 values. But the reverse is not true; no unsigned type can represent all the values of a signed type (namely the negative ones are always left out). Thus if we have access to sufficiently wide signed types, we don’t need any unsigned types at all. (This is more or less the situation in Java.)

People have made the observation that programming primarily involves unsigned numbers (starting from 0), which is indeed true. However, one needs to make use of negative numbers occasionally (such as when subtracting two unsigned indexes), so signed types must exist in a programming language too. Unsigned and signed types are both useful, but signed types have strictly more functionality than unsigned types. Thus it is clear that if language simplicity is desired, then unsigned types are the ones that can be safely omitted.

At a C++ conference, a question about unsigned vs. signed integer types was asked. Three panelists (including Stroustrup, the inventor of C++) unanimously answered that unsigned types are a bad idea.

Whenever you mix signed and unsigned numbers, you get trouble. The rules are just very surprising, and they turn up in code in strange places that correlate very strongly with bugs. Now, when people use unsigned numbers, they usually have a reason. And the reason will be something like, “well it can’t be negative”, or “I need an extra bit”. If you need an extra bit, I am very reluctant to believe you that you really need it, and I don’t think that’s a good reason. When you think you can’t have negative numbers, you will have somebody who initialize[sic] your unsigned with −2, and think they get −2, and things like that. It is just highly error-prone. I think one of the sad things about the standard library is that the indices are unsigned whereas array indices are signed, and you’re sort of doomed to have confusion and problems with that. There are far too many integer types, there are far too lenient rules for mixing them together, and it’s a major bug source. Which is why I’m saying, stay as simple as you can, use integers till you really really need something else. Use int unless you need something different. Then still use something signed until you really need something different. Then resort to unsigned. And yes, it’s unfortunately a mistake in the STL, in the standard library, that we use unsigned indices. If you are writing a very large data structure, the only case where you would really care about the unsigned is if you know it’s an array of characters that’s going to be bigger than half of memory. I don’t know of any other case where it matters. And that is only on a 32-bit system or less, and it just does not come up in practice very much.

after looking at the example with the to_svg_string and because I'm a bit of an SVG nerd I had to re-write it to something that produces SVGs that are about 40something % smaller in size

Can you show me how you accomplish that? I think my strategy of using Mx,yh1v1h-1z is quite efficient already.

nayuki commented 3 weeks ago

Oh I see, you shared this code: https://gist.github.com/dominikwilkowski/9e95f7e2986e0129ef6fba94e94de525

This unwrap is dangerous where an entire line doesn't have any blocks but QR codes shouldn't have any empty lines ever

Wrong; this assumption is not justified. For example: https://www.nayuki.io/page/creating-a-qr-code-step-by-step , "Force light area". It shows there are vertical lines without black modules. This example can be adapted to have horizontal lines without black modules by changing the data so that the error-correction code (on the left side) has no black modules, while introducing some black modules to other parts of the barcode.

Your use of the SVG path command m instead of M makes sense but makes the code retain more state in variables, which makes it longer and more liable to bugs. It also makes it much harder to hand-edit the SVG code to delete modules (which is of debatable utility). So there is a trade-off involved, where your logic produces shorter SVG code at the cost of longer implementation code.

dominikwilkowski commented 3 weeks ago

Thanks for the reply.

At a C++ conference, a question about unsigned vs. signed integer types was asked. Three panelists (including Stroustrup, the inventor of C++) unanimously answered that unsigned types are a bad idea.

That makes sense for C and C++ and I guess Java but rust checks these things at compile time so it's not a good pattern in rust. But I'm gathering you're not keen to make this more rusty. All good then.

Thanks for pointing out that there are times where an entire row could be white. I will fix that in my code. I wasn't sure which is why I included the comment.

So there is a trade-off involved, where your logic produces shorter SVG code at the cost of longer implementation code.

That's usually how it works yeah. There are things I could do to optimize the algorithm but it doesn't sound like this is something that would be something for this crate. FYI: This optimization comes from the popular svgo library that minimizes svgs to make sure we keep them clean. the relevant plugin here: https://svgo.dev/docs/plugins/convertPathData/ Before using SVGs web folks use things like https://jakearchibald.github.io/svgomg/ to make sure they don't send too much data down the pipe.

I'll close this issue now.. Cheers

nayuki commented 3 weeks ago

you're not keen to make this more rusty.

I take offense at this statement, as will be explained in the following paragraphs.

That makes sense for C and C++ and I guess Java but rust checks these things at compile time so it's not a good pattern in rust.

Hopefully you agree that unsigned was implemented in a very dangerous, unintuitive way in C/C++, and that adding unsigned to Java would be a net liability (small gain, big pains). While these other languages are not identical to Rust, they provide clues to the potential problems that can arise.

The fact that Rust checks arithmetic operations at compile time is not really relevant. For example, Java has implicit upcasts and prevents almost all dangerous downcasts. You can compile C/C++ code with various warnings, linters, and sanitizers - and they will pick up on issues like signed-unsigned mixed arithmetic, overflows, etc.

The actual relevant point is that Rust completely forbids implicit arithmetic conversions - even safe upcasts like u8 -> u16 or u16 -> i32 (which I find mildly annoying).

Where I take offense is that my code is already as Rusty as it needs to be. You missed the fact that QrCode.get_module(x: i32, y: i32) intentionally supports out-of-bounds reads and pads the QR Code with an infinite border of light modules (white pixels). This is intended to play well with 2D graphics and calculations that involve translating (offsetting) the QR Code visually.

Viewed in isolation, yes border should be an unsigned integer because a negative border would be nonsensical or dangerous (e.g. cropping into the QR Code - kind of like negative margins in CSS). But no, what happens is that if border is u32, then there would be a conversion to make it play well with x and y which must be i32 as per the API. See this line: https://github.com/nayuki/QR-Code-generator/blob/42a886d7845283a048423bd78a82912584325476/rust/examples/qrcodegen-demo.rs#L185 . And when you add a u32 and i32, you have to be unbelievably careful about what common type to convert to and how to prevent overflow. It's just far easier to make border be i32. Also, you can see that QrCode.size() returns i32, agreeing with the type of x and y.

FYI: This optimization comes from the popular svgo library that minimizes svgs to make sure we keep them clean.

Thanks for mentioning these links; I took a quick look and acknowledge that these SVG utilities exist.

dominikwilkowski commented 3 weeks ago

No offense intended mate. All good.