peachpiecompiler / peachpie

PeachPie - the PHP compiler and runtime for .NET and .NET Core
https://www.peachpie.io
Apache License 2.0
2.38k stars 203 forks source link

strlen does not return correct length #565

Closed jan-ai closed 4 years ago

jan-ai commented 5 years ago

According to php documentation strlen should return the length of a string. As PHP does not natively support multibyte strings, the length of a string is equal to counting the bytes needed to represent the string.

Peachpie uses string.Length internally in case the representation of the PhpValue/PhpString is a string, which is wrong in case the string includes multibyte characters.

We encountered this problem through the following sequence of code

While fgets seems to return a blob as representaion of the PhpString, preg_match internally uses regex.Match which resturns "plain" strings. At the end strlen uses the length of those strings, which were blobs before using regex.Match internally. Even preg_match is the reason for converting in this case and might lead to other problems as well, in general strlen should in any case return the number of bytes instead of the number of multibyte characters.

jakubmisek commented 5 years ago

Yes it is by design. Due to performance reasons, strlen won't decode all the unicode string values into byte arrays.

jan-ai commented 5 years ago

While I understand your argument this completly ignores that any write operation based on buffers might use strlen to check how many data should be written and may also use this information to verify if all data has been written.

In our case fwrite was returning a different number of bytes written compared to strlen of the buffer. Also I don't see any alternative function to get the number of bytes of a buffer, which is supported by Peachpie.

As there is a mb_strlen function which should be used on strings anyway and can be optimized for performance, I see no need to have a non compliant strlen function, because of performance reasons.

That might be me only, but in general I see a problem if a project is not compliant to the specifications and don't want to be. How could we trust into the behaviour of the code in case the spec might be implemented or not?

jakubmisek commented 5 years ago

Since the byte array was encoded as Unicode string within the regex function, it looses the original encoding hence it is not even possible to "reconstruct" the very original length of that byte array. Sadly it is just a tip of a bigger issue (or a feature).

If you are using latest builds of the compiler, you may try:

$nonUnicodeString = (binary)$unicodeString;
jan-ai commented 5 years ago

Thanks for adding this binary cast option that fast. It took a bit to upgrade from 0.9.46 to 0.9.500 as the new Compiler requires .net Standard 2.1. So we had to upgrade and migrate everything from .net Core 2.2 to .net Core 3.0.

I can confirm that (binary) cast is working for the following case:

        $buf = (binary) "\n[".$section."]\n";
        $written = fwrite($fh_part, $buf);

        if ($written != strlen($buf)) {

Nevertheless it throws an exception (see below) at compile time when used it in the following manner:

        $buf = "\n[".$section."]\n";
        $written = fwrite($fh_part, $buf);

        if ($written != strlen((binary) $buf)) {

or

    $buf = (binary) $this->getFileHeader();
    $written = fwrite($fh_part, $buf);

    if ($written != strlen($buf)) {

1>.nuget\packages\peachpie.net.sdk\0.9.500\build\Peachpie.NET.Core.Sdk.targets(224,5): error : Conversion from 'PhpValue' to '' not implemented at xxx

Let me also share me thoughts again that there is no sense in high performance as long as the implementation does not care about the specification. Any code could be faster by removing or not implementing costly parts of a specification. Just to show the implication, if we would have known beforehand that performance has a higher priority than implementing the specification correctly, we would have not started using PeachPie.

I'm not the expert here, but could the same conversion you just implemented as a binary cast option probably easily added to str.Length of the following code taken from https://github.com/peachpiecompiler/peachpie/blob/1df25a9fda809a01097d95442a464b56e1721ee3/src/Peachpie.Runtime/PhpString.cs

                        static int ChunkLength(object chunk)
                        {
                            AssertChunkObject(chunk);

                            switch (chunk)
                            {
                                case string str: return str.Length;
                                case Blob b: return b.Length;
                                default: return ((Array)chunk).Length;
                            }
                        }
jakubmisek commented 5 years ago

Not priority but when we would encode everything to bytes, it would be almost unusable.

jan-ai commented 5 years ago

I don't see why everything should be encoded as bytes. It's only strlen that generates a problem here. So changing

case string str: return str.Length;

of the above code (ChunkLength) to

case string str: return Convert.ToBytes(str, ctx).Length

should work, right? (as long as you have access to ctx) (Convert from https://github.com/peachpiecompiler/peachpie/blob/1a942cae244b6099f4c75e29fa1bcb53ce43e483/src/Peachpie.Runtime/Conversions.cs)

liesauer commented 5 years ago

get Conversion from 'PhpValue' to '' not implemented too.

$content = (binary)$response->getRawContent();
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error : Conversion from 'PhpValue' to ''  not implemented at O:\Projects\NULASpider\NULASpider.PHP\src\Spider\Spider.php(529, 9) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.ConversionsExtensions.EmitConversion(CodeGenerator cg, CommonConversion conversion, TypeSymbol from, TypeSymbol to, TypeSymbol op, Boolean checked) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundConversionEx.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.EmitSpecialize(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Emit(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundConversionEx.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.EmitSpecialize(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Emit(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundCopyValue.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.EmitSpecialize(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Emit(BoundExpression expr) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundAssignEx.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundExpressionStatement.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.BoundStatement.Pchp.CodeAnalysis.CodeGen.IGenerator.Generate(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Generate(IGenerator element) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Collections.Generic.List`1.ForEach(Action`1 action) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.Graph.BoundBlock.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.Graph.StartBlock.Emit(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Semantics.Graph.BoundBlock.Pchp.CodeAnalysis.CodeGen.IGenerator.Generate(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Generate(IGenerator element) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.GenerateBlock(BoundBlock block) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.LocalScope.ContinueWith(BoundBlock block) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.GenerateScope(BoundBlock block, ScopeType type, Int32 to) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.GenerateScope(BoundBlock block, Int32 to) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.Symbols.SourceRoutineSymbol.Generate(CodeGenerator cg) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.CodeGenerator.Generate() [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.MethodGenerator.<>c__DisplayClass1_0.<GenerateMethodBody>b__0(ILBuilder builder) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.MethodGenerator.GenerateMethodBody(PEModuleBuilder moduleBuilder, MethodSymbol routine, Action`1 builder, VariableSlotAllocator variableSlotAllocatorOpt, DiagnosticBag diagnostics, Boolean emittingPdb) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.CodeGen.MethodGenerator.GenerateMethodBody(PEModuleBuilder moduleBuilder, SourceRoutineSymbol routine, Int32 methodOrdinal, VariableSlotAllocator variableSlotAllocatorOpt, DiagnosticBag diagnostics, Boolean emittingPdb) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at Pchp.CodeAnalysis.SourceCompiler.EmitMethodBody(SourceRoutineSymbol routine) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Threading.Tasks.Parallel.<>c__DisplayClass44_0`2.<PartitionerForEachWorker>b__1(IEnumerator& partitionState, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error : --- End of stack trace from previous location where exception was thrown --- [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Threading.Tasks.Parallel.<>c__DisplayClass44_0`2.<PartitionerForEachWorker>b__1(IEnumerator& partitionState, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Threading.Tasks.TaskReplicator.Replica`1.ExecuteAction(Boolean& yieldedBeforeCompletion) [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
C:\Users\LiesAuer\.nuget\packages\peachpie.net.sdk\0.9.500-dev-p.cd7f96ff\build\Peachpie.NET.Core.Sdk.targets(224,5): error :    at System.Threading.Tasks.TaskReplicator.Replica.Execute() [O:\Projects\NULASpider\NULASpider.PHP\NULASpider.PHP.msbuildproj]
jakubmisek commented 5 years ago

thanks @liesauer , conversion fixed at https://github.com/peachpiecompiler/peachpie/commit/6e098e9d9c24b4d81a0f970877324dcfe80a4558

liesauer commented 5 years ago

patch file 0001-conversion-value-binary.patch.zip

jakubmisek commented 4 years ago

We'll close this issue for now, since strlen implementation won't be changed.

In order to treat string as byte[], they have to be read as binary, i.e.

jan-ai commented 3 years ago

The binary cast does not influence strlen anymore in version 1.0.5 (haven't checked 1.0.6) yet. Is it still the correct way to use binary cast?

jakubmisek commented 3 years ago

thanks, seems the (binary) cast works but it was then decoded back to string for some reason. The next update will fix it.