kbilsted / NotepadPlusPlusPluginPack.Net

.Net package to install into visual studio to make plugins for Notepad++
Apache License 2.0
165 stars 52 forks source link

GetText() is only getting 10'000 characters #111

Open quadrixstefanovic opened 1 year ago

quadrixstefanovic commented 1 year ago

When using the GetText(int length) method I only get a maximum of exactly 10'000 characters back, independent of the length value.

kbilsted commented 1 year ago

It is how it was implemented

molsonkiko commented 9 months ago

So I came up with a reasonable fix for all these methods that allocate a fixed buffer of size 10 thousand characters. I also posted about this in the community forum, BTW.

the problem

There are a lot of methods (29 by my count) in the ScintillaGateway.cs that look something like this:

public unsafe string DescriptionOfStyle(int style)
{
    byte[] descriptionBuffer = new byte[10000];
    fixed (byte* descriptionPtr = descriptionBuffer)
    {
        Win32.SendMessage(scintilla, SciMsg.SCI_DESCRIPTIONOFSTYLE, (IntPtr) style, (IntPtr) descriptionPtr);
        return Encoding.UTF8.GetString(descriptionBuffer).TrimEnd('\0');
    }
}

This is not the right way to do these things because 10000 might be too small (leading to buffer overflow) or too large (leading to waste).

Why this is wrong

Regarding methods where the user passes in a buffer to be filled (char *), the Scintilla documentation says:

Arguments point at text buffers that Scintilla will fill with text.
In some cases, another argument will tell Scintilla the buffer size.
    In others, you must make sure that the buffer is big enough to hold the requested text.
If a NULL pointer (0) is passed then, for SCI_* calls,
    the length that should be allocated, not including any terminating NUL, is returned.
Some calls (marked "NUL-terminated") add a NUL character to the result but other calls do not:
    to generically handle both types, allocate one more byte than indicated and set it to NUL.

My proposed fix

  1. add the methods shown below to ScintillaGateway.cs
  2. In the above example, with SciMsg.SCI_DESCRIPTIONOFSTYLE as the Scintilla message and int length as a parameter to the C# method, replace the entire body of the method with
    return GetNullStrippedStringFromMessageThatReturnsLength(SciMsg.SCI_DESCRIPTIONOFSTYLE, (IntPtr)style);
  3. If the C# method does not have a parameter (say for public unsafe string GetWhitespaceChars(), I would replace the whole body of the function with
    return GetNullStrippedStringFromMessageThatReturnsLength(SciMsg.SCI_GETWHITESPACECHARS);

    Note that in this case the optional wParam argument is not supplied.

In any case, here is the GetNullStrippedStringFromMessageThatReturnsLength method in my proposed fix.

/// <summary>
/// returns bytes decoded from UTF-8 as a string, with all trailing NULL bytes stripped off.
/// </summary>
public static string Utf8BytesToNullStrippedString(byte[] bytes)
{
    int lastNullCharPos = bytes.Length - 1;
    // this only bypasses NULL chars because no char
    // other than NULL can have any 0-valued bytes in UTF-8.
    // See https://en.wikipedia.org/wiki/UTF-8#Encoding
    for (; lastNullCharPos >= 0 && bytes[lastNullCharPos] == '\x00'; lastNullCharPos--) { }
    return Encoding.UTF8.GetString(bytes, 0, lastNullCharPos + 1);
}

/// <summary>
/// Recall that all Scintilla methods have the signature 
/// (scintilla* scin, SciMsg msg, void* wParam, void* lParam) -&gt; void*<br></br>
/// Many of these scintilla methods are bimodal in the following way<br></br>
/// * if lParam is 0, return the length of the buffer to be filled and have no side effects. The wParam may be involved in telling Scintilla how big the buffer needs to be.<br></br>
/// * if lParam is greater than 0, it is assumed to be a pointer to a buffer. Now the wParam indicates what the text will need to be.<br></br><br></br>
/// This sets lParam to 0 to get the length, allocates a buffer of that length,<br></br>
/// uses the second mode to fill a buffer,<br></br>
/// and returns a string of the UTF8-decoded buffer with all trailing '\x00' chars stripped off.
/// </summary>
/// <param name="msg">message to send</param>
/// <param name="wParam">another parameter for defining what the buffer should contain</param>
/// <returns></returns>
private unsafe string GetNullStrippedStringFromMessageThatReturnsLength(SciMsg msg, IntPtr wParam=default)
{
    int length = Win32.SendMessage(scintilla, msg, wParam, (IntPtr)Unused).ToInt32();
    byte[] textBuffer = new byte[length];
    fixed (byte* textPtr = textBuffer)
    {
        Win32.SendMessage(scintilla, msg, wParam, (IntPtr)textPtr);
        return Utf8BytesToNullStrippedString(textBuffer);
    }
}

I may also submit a PR, but I wanted to post this here first.

kbilsted commented 9 months ago

Why are you shouting?

molsonkiko commented 9 months ago

Why are you shouting?

Fair question, fixed