kklas / anchor-client-gen

A tool for generating solana web3 clients from anchor IDLs.
MIT License
135 stars 21 forks source link

Support Uint8Array when applicable #42

Open mschneider opened 2 years ago

mschneider commented 2 years ago

Hey @kklas , thank you for your help with the bytes type. I am currently working on a project, where I would like to use this for more data types, specifically:

  1. Vec<u8>
  2. fixed size arrays [u8; 32]

I had a look at the code already and was able to get it to work on a local branch: https://github.com/mschneider/anchor-client-gen/tree/max/uint8array2 but curious to hear your thoughts as this somewhat breaks with the current abstraction of the library on a few levels.

My ideal outcome would be:

  1. always use the right native array type
  2. always verify size when initializing from user input
  3. possibly pad when initializing from too short user input

Let me know what you think about these changes.

kklas commented 2 years ago

Isn't Vec<u8> already encoded as Uint8Array after that PR?

In any case, technically for "bytes" fields it makes sense to be Uint8Array because:

For others you need to rely on, for example borsh.vec(borsh.u16()) which will return Array\<number> in this case.

Why handle Vec specifically as bytes in the IDL and decode into Buffer in borsh-ts and others as Array? IDK ask Armani.

Could this be done? Sure it could. Do we want this? IDK maybe, but I think probably not.

Uint32 is more difficult to work with than Array. Calling .values() on Uint64Array will return an array of bigint while we normally use BN for u64 so this will be a mess. Uint128Array doesn't even exists so what do you use for Vec\<u128>? For Vec\<u8> it kinda makes sense to specially mark them as bytes in the IDL and decode them as Buffer/Uint8Array because you most often represent bytes like that especially when you want to encode some raw data in your account. But for others like Vec\<u64> / [u64; n] you are most often storing just a series of numbers so it makes sense to me to just keep those as Array\<BN>.

IDK TBH. You might as well ask why Vec gets a special encoding as "bytes" in the IDL and not as vec of u8? This might be a valid question for the anchor guys.

But what is your use case? What do you need "native array types" for that you cant do with just arrays?

As for

  1. always verify size when initializing from user input

This makes sense to do.

  1. possibly pad when initializing from too short user input

Isn't this already done automatically when encoding the layout with the serum borsh package?

mschneider commented 2 years ago

Correct Vec is already encoded as bytes, hence Uint8Array. I'm currently looking at fixed length u8 arrays:

I'm mainly concerned with input validation for overflows, when assigning to Vec<u8> & [u8; 4]:

  1. [1, 2, 3]
  2. [1, 2, 3, 256]
  3. [1, 2, 3, 4, 5]

How are these currently handled?

mschneider commented 2 years ago

First case, would return obscure errors like this one, which make it really hard to trace, which field actually caused the error:

    TypeError: Blob.encode[data] requires (length 32) Buffer as src
      at Blob.encode (node_modules/buffer-layout/lib/Layout.js:2321:13)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)
      at WrappedLayout.encode (node_modules/@project-serum/borsh/src/index.ts:106:24)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)

I'm concerned that the second case will trigger an equally obscure error message, while choosing a more explicit type for the generated structure / interfaces will make the issue more transparent to the programmer using the generated code as the type checker can identify which argument potentially causes an overflow.

mschneider commented 2 years ago

The above error comes from passing an Uint8array for a blob(32) struct field, apparently we won't be able to support Uint8Array at all and always need to pass Buffer when interacting with borsh due to this check https://github.com/pabigot/buffer-layout/blob/main/lib/Layout.js#L2319

I'll send a new PR that first updates all usage of Uint8Array to Buffer, so we don't run into this issue anywhere

kklas commented 2 years ago

apparently we won't be able to support Uint8Array at all and always need to pass Buffer when interacting with borsh

We're not sending Uint8Array anywhere to borsh. Buffer is sent.

kklas commented 2 years ago

How are these currently handled?

They're handled as array of numbers both in IDL, borsh layout, and the generator.

I'm mainly concerned with input validation for overflows, when assigning to Vec & [u8; 4]

Javascript is not a typed language. Typescript helps a lot but it's not perfect, the typing is only present at compile time. Fixed sized arrays don't exist neither in javascript nor typescript, so not sure what we can expect there. The only thing we can maybe do is check for array length when constructing types or call instructions.

Keep in mind that the code generated by anchor-client-gen is a bit lower level-ish and you will probably want to create a lib that wraps around it and is more semantic and include your checks there. See https://github.com/kklas/anchor-client-gen/issues/5#issuecomment-1076858474

mschneider commented 2 years ago

Yeah, so the idl does preserve array length and type bounds, so it should be possible. I'll follow your advise and create a nice lib around the generated code to solve some of those issues on our end so that it is very integration friendly

mschneider commented 2 years ago

Also dropped the constant array size binding to Buffer approach, turned out to cause some nasty issues when trying to use Anchor's fetch helpers