Open exikyut opened 2 years ago
Welp, I thought FFI::memcpy()
accepted an independent offset parameter - I hadn't checked. I might not be able to use the FFI for this :(
My attempts to get fancy with FFI:addr()
produced Interesting™ results though; extending the last example above...
<?php
$ffi = FFI::cdef('
typedef long unsigned int size_t;
void *malloc(size_t size);
');
$ptr = $ffi->cast('uint8_t *', $ffi->malloc(27));
for ($i = 0; $i < 26; $i++) {
$ptr[$i] = $i + 65;
}
for ($i = 0; $i < 10; $i++) {
print "$i: "; var_dump(FFI::addr($ptr) + $i);
}
...Produces (with the exact same backtrace)...
0: object(FFI\CData:uint8_t**)#4 (1) {
[0]=>
object(FFI\CData:uint8_t*)#1 (1) {
[0]=>
int(65)
}
}
1: object(FFI\CData:uint8_t**)#1 (1) {
[0]=>
NULL
}
2: object(FFI\CData:uint8_t**)#4 (1) {
[0]=>
NULL
}
3: object(FFI\CData:uint8_t**)#1 (1) {
[0]=>
Segmentation fault
<?php
$ffi = FFI::cdef('
typedef long unsigned int size_t;
void *malloc(size_t size);
');
$ptr = $ffi->cast('uint8_t *', $ffi->malloc(27));
for ($i = 0; $i < 26; $i++) {
$ptr[$i] = $i + 65;
}
var_dump(FFI::string($ptr, 10));
$ptr += 5;
var_dump(FFI::string($ptr, 10));
$ptr += 5;
var_dump(FFI::string($ptr, 10));
string(10) "ABCDEFGHIJ"
string(10) "FGHIJKLMNO"
string(10) "KLMNOPQRST"
Naturally that's done in C by just assigning to
ptr + offset
(where ptr might beuint8_t *
and offset isint
), so I wondered if the same worked here.
From the docs:
C array elements can be accessed as regular PHP array elements, e.g. $cdata[$offset]
So the example from your second post should be:
<?php
$ffi = FFI::cdef('
typedef long unsigned int size_t;
void *malloc(size_t size);
');
$ptr = $ffi->cast('uint8_t *', $ffi->malloc(27));
for ($i = 0; $i < 26; $i++) {
$ptr[$i] = $i + 65;
}
for ($i = 0; $i < 10; $i++) {
print "$i: "; var_dump(FFI::addr($ptr[$i]));
}
However, the docs also state:
C pointers can be incremented and decremented using regular +/-/ ++/–- operations, e.g. $cdata += 5
So print "$i: "; var_dump($ptr++);
would also work.
Still, it should be examinated why normal addition doesn't work, and apparently causes segfaults.
Still, it should be examinated why normal addition doesn't work, and apparently causes segfaults.
When an integer is added to an FFI pointer, and the pointer has a refcount of 1 (what is the case in the first example), the type ownership is transferred; that causes the segfault. We could remember the type, but I assume that is not done to avoid unnecessary memory usage for the likely more common case, where the pointer itself is increased (or decreased).
Given that array dereferencing is the prefered method to do this, and that it is also possible to work-around this by increasing the refcount (by assigning to another variable), I tend to leave this as is. We should probably improve the documentation regarding this issue.
Wow I see. Yeah I wouldn't have figured that out too easily :)
Hmm. I'm basically trying to do FFI substr()
: here's a ptr
, give length
bytes starting at offset
. That was both my original goal and what I happen to be trying to do once again in a separate context I'm working on right now.
I'm currently using the FFI so I can a) control memory allocation, b) use mmap()
(yay!) and c) speed up hot loops where possible. While it's easily argued PHP's overhead makes the last idea moot to some extent, I understand FFI *does* JIT certain operations, and I was hoping everything would align optimally and I'd be able to get my hands on some 90%-outcome/10%-effort racing stripes without having to pull out C :)
Array dereferencing means I have to write for
loops inside PHP userspace, which basically slows everything down across the board by O(1*(PHP overhead)
) per length
, producing ever-larger "this is not a design win, this is wrong, this is wrong" headdesks as length
grows. I see FFI::string() as the logical way to ask for longer runs of data. IMHO, I argue this pattern and context needs revisiting - just to loosen it up a little, not necessarily judge it out of the room - in order for the FFI to scale to real-world use cases.
I could see adding an offset
parameter to FFI::string(), but that feels somewhat short-sighted. At the end of the day, IMO FFI users are exceedingly likely to have ether personal experience or project-level exposure to ptr + offset
notation, so making FFI\CData + int
implicitly do C-style pointer offset calculation makes 1000% sense to me. It's not magic behavior if it's documented! :P
Edit: But the offset
parameter idea has some merit, on second thought - I think the FFI\CData + int
behavior I'm describing would require the creation of a new FFI\CData
object on each invocation, whereas FFI::string($ptr, $start, $length);
is just an additional parameter to a function I need to call in both cases. And then there's the fact that the only other real way I can access bytes through the FFI is through array dereferencing in any case. Hmm. Perhaps this is a good idea... and perhaps also FFI::memcpy()
, FFI::memcmp()
, FFI::memset()
etc could have optional to_offset
and from_offset
parameters added for completeness...? :v
The motivation here is to move the for
loop from PHP to C. That's about it.
I couldn't help but become very curious at your mention of refcounts, so I wondered what would happen if I added a $hack = &$ptr;
just after the malloc() in the very first example - I'm not sure exactly how PHP works in this area but I was thinking this might increase the refcount. I'm not sure what to make of the fact that PHP segfaulted exactly the same.
But while playing around with this I also found another piece of interestingness:
$ffi = FFI::cdef('
typedef long unsigned int size_t;
void *malloc(size_t size);
');
$ptr = $ffi->cast('uint8_t *', $ffi->malloc(8192));
for ($i = 0; $i < 26; $i++) {
$ptr[$i] = $i + 97;
}
var_dump(FFI::string($ptr, 10));
var_dump(FFI::string($ptr + 5, 10));
var_dump(FFI::string($ptr, 9));
This produces:
$ php8.1 xx3.php
string(10) "abcdefghij"
string(10) "fghijklmno"
PHP Fatal error: Uncaught FFI\Exception: attempt to read over data boundary
This is with allocating 8K. I initially had malloc(27)
in there.
However, on my system (fingers crossed this repros elsewhere, heh), if I lower the 9
to an 8
in the last line, I get:
$ php8.1 xx3.php
string(10) "abcdefghij"
string(10) "fghijklmno"
string(8) " ×EõÆU"
Eeek. Interestingly the last character in what falls out is almost always a U
(I saw a V
once) so I'm probably reading out a struct.
But while playing around with this I also found another piece of interestingness:
That's exactly the same issue as before. $ptr + 5
transfers the type ownership. You can prevent that from happening by assigning e.g. $keep = $ptr
.
Description
The following code:
Resulted in this output:
But I expected this output instead: Not sure what would be canonically correct
I get the following backtrace:
Here's line 1880:
https://github.com/php/php-src/blob/PHP-8.1.7/Zend/zend_object_handlers.c#L1880
Additionally,
This code (which swaps
var_dump()
forvar_export()
, to take the alternate path throughzend_std_get_properties_for()
):Results in this output:
But I expected this output instead: Not sure what would be canonically correct
hOwEvEr,
ThIs code:
Produces this output!!
Commentary:
I'm currently using the FFI to
mmap()
some data into memory (for both read and write). Reading and writing integer values (typecasted to uint16_t/uint32_t) is working perfectly fine (thanks for plumbing everything out so this is as simple as assigning to a PHP array :D), but now I needed to read and write sequences of characters (binary data I want written to specific offsets).Naturally that's done in C by just assigning to
ptr + offset
(where ptr might beuint8_t *
and offset isint
), so I wondered if the same worked here. It... *does*, I think...? Except it crashes. This is why I said I don't know what the canonically-correct outcome here is: this is currently Schrodinger's architecture, where I'm equally observing "yes that's the right way to do it" and *crash* and I don't know where to put the consensus :)I was reading the values I wanted to
mmap()
using (PHP-side)fread()
so I'll either just stick with that or play with FFI::memcpy() for now. Thanks again for FFI, it's really awesome.PHP Version
PHP 8.1.7 from sury.org
Operating System
Debian 11.3