php / php-src

The PHP Interpreter
https://www.php.net
Other
38.1k stars 7.74k forks source link

[FFI] Allow to pass CData into struct and/or union fields. #11934

Closed SerafimArts closed 1 year ago

SerafimArts commented 1 year ago

Feature/Bug

The following code not working now:

<?php

// Boxed scalar
$uint32 = FFI::new('uint32_t');
$uint32->cdata = 0xDEAD_BEEF;

// Struct
$ffi = FFI::new('struct { uint32_t field; }');
$ffi->field = $uint32;

Or more real world usage:

<?php

$ffi = FFI::new('struct { uint64_t field; }');

$value = FFI::new('struct __attribute__((__packed__)) { uint32_t lo; uint32_t hi; }');
$value->lo = 0x42;        // First 32 bytes
$value->hi = 0xDEAD_BEEF; // Second 32 bytes

// Because PHP (x86) does not support uint64.
// On 64-bit PHP, this can be done with binary conversions (int64 -> uint64), 
// but this is not so convenient.
$ffi->field = FFI::cast('uint64_t', FFI::addr($value));

Resulted in this output:

PHP Warning: Object of class FFI\CData could not be converted to int

I think it makes sense to do automatic boxing/unboxing to int, at least in the same way as it is done with GMP. However, I still have no idea how to implement this without breaking backwards compatibility =\\

KapitanOczywisty commented 1 year ago

I think code responsible for that is here: ffi.c#L734. Though, I'm not sure how viable is adding CData.

However you can do some trickery to have similar effect:

<?php

// HackA
$ffi = FFI::new('struct { uint64_t field; }');

$hack = FFI::cast('uint32_t*', FFI::addr($ffi));
$hack[1] = 0x42;        // First 32 bytes
$hack[0] = 0xDEAD_BEEF; // Second 32 bytes

var_dump(dechex($ffi->field)); // x64 only
var_dump(bin2hex(FFI::String($ffi->field, 8)));

// HackB
$ffi = FFI::new('struct { uint64_t field; }');
FFI::memcpy($ffi, strrev(hex2bin('00000042DEADBEEF')), 8);

var_dump(bin2hex(FFI::String($ffi->field, 8)));

// HackC
$ffi = FFI::new('struct { uint64_t field; }');
$hack = FFI::cast('struct { uint32_t lo; uint32_t hi; }', $ffi);
$hack->hi = 0x42;        // First 32 bytes
$hack->lo = 0xDEAD_BEEF; // Second 32 bytes

var_dump(bin2hex(FFI::String($ffi->field, 8)));

Also note that calling some methods statically will be deprecated in PHP 8.3: https://wiki.php.net/rfc/deprecate_functions_with_overloaded_signatures#fficast_ffinew_and_ffitype

SerafimArts commented 1 year ago

Also note that calling some methods statically will be deprecated in PHP 8.3

Yes, I know, but I used a more readable and convenient option as an example.


The point here is not that it is impossible to fix this problem. But rather, that you have to fence a lot of non-obvious code (overengineer) for this.

P.S. there is another hack through set by pointer, but I could not check the correctness of this option:

$struct = FFI::new('struct { uint64_t value; }');

FFI::memcpy(FFI::addr($struct->value), pack('LL', 2**32-1, 42), 8);
KapitanOczywisty commented 1 year ago

But rather, that you have to fence a lot of non-obvious code (overengineer) for this.

Assigning CData would be basically memcpy, just without size argument and I'm not entirely sure this can be done cleanly, but I'll take a look.

nielsdos commented 1 year ago

Ah, I was also looking at it and only now see your message. A simple PoC would be:

diff --git a/ext/ffi/ffi.c b/ext/ffi/ffi.c
index f2d03699ba..50ca250a28 100644
--- a/ext/ffi/ffi.c
+++ b/ext/ffi/ffi.c
@@ -771,10 +771,24 @@ static zend_always_inline zend_result zend_ffi_zval_to_cdata(void *ptr, zend_ffi
            lval = zval_get_long(value);
            *(int16_t*)ptr = lval;
            break;
-       case ZEND_FFI_TYPE_UINT32:
-           lval = zval_get_long(value);
+       case ZEND_FFI_TYPE_UINT32: {
+           bool failed;
+           lval = zval_try_get_long(value, &failed);
+           if (failed) {
+               if (Z_TYPE_P(value) == IS_OBJECT && Z_OBJCE_P(value) == zend_ffi_cdata_ce) {
+                   zend_ffi_cdata *cdata = (zend_ffi_cdata*)Z_OBJ_P(value);
+                   if (zend_ffi_is_compatible_type(type, ZEND_FFI_TYPE(cdata->type)) &&
+                       type->size == ZEND_FFI_TYPE(cdata->type)->size) {
+                       memcpy(ptr, cdata->ptr, type->size);
+                       return SUCCESS;
+                   }
+               }
+               zend_ffi_assign_incompatible(value, type);
+               return FAILURE;
+           }
            *(uint32_t*)ptr = lval;
            break;
+       }
        case ZEND_FFI_TYPE_SINT32:
            lval = zval_get_long(value);
            *(int32_t*)ptr = lval;

i.e. using the same strategy as the default case, and using a fallible zval_try_get_long call. Ofc, this needs to be applied to the other cases too, and that code would have to be deduplicated.

KapitanOczywisty commented 1 year ago

I was thinking about using zend_ffi_is_compatible_type and if types are compatible copying memory (with maybe some direct assignments for short types). I think changing every case is just wasteful.

nielsdos commented 1 year ago

Depends on the interpretation of "changing every case" ig :P Only annoying thing with the copies instead of direct assignments is endianness (for things like *(uint32_t*)ptr = lval etc).

KapitanOczywisty commented 1 year ago

Shouldn't endianness be the same for types with the same sizes? I'm assuming you cannot assign CData(char) to uint32_t, and if you try to assign to/from struct or array, I'm expecting same byte order anyway (not sure if this is allowed in that function). I think FFI::Cast does something similar, but instead pointer it'd be a memcpy.

nielsdos commented 1 year ago

Shouldn't endianness be the same for types with the same sizes?

It is, but what I mean by that is when the cases are generalised into a single one, then we'll have to convert *(whatever_type*)ptr = lval to a memcpy for convenience for the IS_LONG zval cases. This is where the endianness difference plays a role.

SerafimArts commented 1 year ago

From the point of view of BC: Maybe it makes sense to introduce the FFI::get() and FFI::set() methods for explicitly setting and getting a value in the form of a CData?

$struct = FFI::cdef('struct { int value; }');

$struct->value; // int(0)
FFI::get($string->value); // object(CData) { cdata: int(0) }

// And
$struct->value = 42;
FFI::set($string->value, 42); // Error: CData expected
FFI::set($string->value, FFI::new('int')); // Ok

P.S. And it can do the same with functions.

char* example(void);

///
$ffi->example(); // string(4) "test"
FFI::get($ffi->example()); // object(CData) { [0] => 't' }

...although I don't remember exactly in which cases the return results of functions are converted to PHP types, instead of CData

nielsdos commented 1 year ago

I'm not sure what the best way to approach the BC is. I guess the BC break isn't that bad because we're replacing a warning of something that didn't work with something that works? But I'm not a user of FFI myself, so I can't properly judge the BC.

KapitanOczywisty commented 1 year ago

we'll have to convert *(whatever_type*)ptr = lval to a memcpy for convenience for the IS_LONG zval cases.

I'm not sure what you mean, generalized would be only for CData, and this could be even limited to the same size CData.

I guess the BC break isn't that bad because we're replacing a warning of something that didn't work with something that works?

CData wasn't supposed to be supported at all for fields, so "none". Existing assignments would remain the same.

nielsdos commented 1 year ago

I'm not sure what you mean, generalized would be only for CData, and this could be even limited to the same size CData.

I feel like we actually want to approach this the same way, but we're having some miscommunication. I think I get now what you mean by I think changing every case is just wasteful. I interpreted this as you wanting to have fallthrough cases to handle the assignment, but I now think you meant checking for this case at the very start of the function.

In any case, feel free to work on this if you want, I don't want to steal this from you :)

KapitanOczywisty commented 1 year ago

Yes, exactly, sorry about that.

In any case, feel free to work on this if you want, I don't want to steal this from you :)

I don't mind :D I've assumed nobody really wants to touch FFI. I'm really impressed how many things you're doing here!

nielsdos commented 1 year ago

Yes, exactly, sorry about that.

No worries, glad we sorted this out.

In any case, feel free to work on this if you want, I don't want to steal this from you :)

I don't mind :D I've assumed nobody really wants to touch FFI. I'm really impressed how many things you're doing here!

:D I've filed https://github.com/php/php-src/pull/11935 that implements this. It's too late for 8.3 because we're past the feature freeze, but better late than never. Hopefully I didn't overlook anything.

SerafimArts commented 1 year ago

It's too late for 8.3 because we're past the feature freeze, but better late than never.

Hmm, I think FFI is not popular enough to take this into account: https://packagist.org/?query=ffi&tags=ffi Almost all popular packages are written by me.

I think it makes sense to ask release managers (like Pierrick Charron. Alas, I do not know his nickname on the github to ping) if this is it possible to add this feature/fix.

Girgias commented 1 year ago

The RMs are @adoy, @bukka, and @ericmann if needed.

nielsdos commented 1 year ago

Merged for 8.3 just in time.