Closed danielchristianschroeter closed 1 month ago
I'm seeing the same error when trying to use Redisson maps. Dug down to the following line in LuaCommands.cs
:
which does not return a value. Looking at storeScriptCache
shows it to be empty:
So it looks like somehow the script isn't properly getting stored.
Digging into my case further, it is correct that there is no value returned. If I run with Redis I get a NOSCRIPT error back on the same execution. So I may be running into something different.
Looks like this is at least part of the problem:
https://github.com/microsoft/garnet/blob/f281bc3dde9bf8b695279f518c8a25e16609c8de/libs/server/Lua/LuaCommands.cs#L235
ex.innerException
is null.
Changing that line to
while (!RespWriteUtils.WriteError("ERR " + (ex.InnerException ?? ex).Message, ref dcurr, dend))
fixes the crash and forwards the error on to the client.
The fix LGTM, if you can create a PR that would be great.
@danielchristianschroeter - please check whether this change unblocks your example.
Created #642
@mmclure-msft Thanks to your suggested changes, the NullReferenceException is resolved. However, I’m still encountering "Error executing Lua script callback". Any chance to get more information?
2024-09-05 17:11:55 03::11::55 fail: Session[0] [172.22.0.1:37528] [005C39D4] Error executing Lua script callback
I'm seeing similar issues. I think there may be subtle (or not so subtle) differences in how NLua parses the scripts vs. the library that's used by Redis. Setting --logger-level Trace
on the command line may help get more details on what's failing - for example I'm seeing
08::32::03 fail: Session[0] [10.0.0.123:6379] [127.0.0.1:64740] [004B8E6D] Error executing Lua script System.IndexOutOfRangeException: Index was outside the bounds of the array. at System.Array.GetFlattenedIndex(Int32 rawIndex) at System.Array.GetValue(Int32 index) at NLua.MetaFunctions.TryAccessByArray(Lua luaState, Type objType, Object obj, Object index) at NLua.MetaFunctions.GetMethodInternal(Lua luaState) at NLua.MetaFunctions.GetMethod(IntPtr state) at KeraLua.NativeMethods.lua_pcallk(IntPtr luaState, Int32 nargs, Int32 nresults, Int32 errorfunc, IntPtr ctx, IntPtr k) at KeraLua.Lua.PCall(Int32 arguments, Int32 results, Int32 errorFunctionIndex) at NLua.Lua.CallFunction(Object function, Object[] args, Type[] returnTypes) at NLua.Lua.CallFunction(Object function, Object[] args) at NLua.LuaFunction.Call(Object[] args) at Garnet.server.LuaRunner.RunInternal(String[] keys, String[] argv) in C:\src\garnet\libs\server\Lua\LuaRunner.cs:line 374 at Garnet.server.LuaRunner.RunTransactionInternal(String[] keys, String[] argv) in C:\src\garnet\libs\server\Lua\LuaRunner.cs:line 323 at Garnet.server.LuaRunner.Run(Int32 count, SessionParseState parseState) in C:\src\garnet\libs\server\Lua\LuaRunner.cs:line 281 at Garnet.server.RespServerSession.ExecuteScript(Int32 count, LuaRunner scriptRunner) in C:\src\garnet\libs\server\Lua\LuaCommands.cs:line 196
But I think it might be best to track this under a new issue once we have more details.
Thank you!
I already tried setting the logger level to Trace, but I still don’t get additional information in my case. All I receive is:
fail: Session[0] [192.168.0.1:51102] [005818D1] Error executing Lua script callback
Without more details, it’s hard to debug. But of course, we can close this issue if the PR is merged.
After adding detailed logging to the ExecuteScript function within libs/server/Lua/LuaCommands.cs, I identified that the Lua script is attempting to call a function named status_reply, which is not defined.
Log Output:
2024-09-05 17:49:16 03::49::16 info: Session[0] [192.168.48.1:41822] [0283D742] Executing Lua script with count: 5
2024-09-05 17:49:16 03::49::16 info: Session[0] [192.168.48.1:41822] [0283D742] Lua script input: Garnet.server.SessionParseState
2024-09-05 17:49:16 03::49::16 fail: Session[0] [192.168.48.1:41822] [0283D742] Error executing Lua script callback. Inner Exception: (null) NLua.Exceptions.LuaScriptException: [string "..."]:2: attempt to call a nil value (field 'status_reply')
Relevant Code: https://github.com/microsoft/garnet/blob/f281bc3dde9bf8b695279f518c8a25e16609c8de/libs/server/Lua/LuaCommands.cs#L192
private unsafe bool ExecuteScript(int count, LuaRunner scriptRunner)
{
try
{
logger?.LogInformation("Executing Lua script with count: {count}", count);
logger?.LogInformation("Lua script input: {parseState}", parseState);
object scriptResult = scriptRunner.Run(count, parseState);
logger?.LogInformation("Lua script executed successfully. Result: {scriptResult}", scriptResult);
if (scriptResult != null)
{
if (scriptResult is string s)
{
while (!RespWriteUtils.WriteAsciiBulkString(s, ref dcurr, dend))
SendAndReset();
}
else if ((scriptResult as byte?) != null && (byte)scriptResult == 36) //equals to $
{
while (!RespWriteUtils.WriteDirect((byte[])scriptResult, ref dcurr, dend))
SendAndReset();
}
else if (scriptResult as Int64? != null)
{
while (!RespWriteUtils.WriteInteger((Int64)scriptResult, ref dcurr, dend))
SendAndReset();
}
else if (scriptResult as ArgSlice? != null)
{
while (!RespWriteUtils.WriteBulkString(((ArgSlice)scriptResult).ToArray(), ref dcurr, dend))
SendAndReset();
}
else if (scriptResult as Object[] != null)
{
// Two objects one boolean value and the result from the Lua Call
while (!RespWriteUtils.WriteAsciiBulkString((scriptResult as Object[])[1].ToString().AsSpan(), ref dcurr, dend))
SendAndReset();
}
}
else
{
while (!RespWriteUtils.WriteDirect(CmdStrings.RESP_ERRNOTFOUND, ref dcurr, dend))
SendAndReset();
}
}
catch (LuaScriptException ex)
{
logger?.LogError(ex, "Error executing Lua script callback. Inner Exception: {InnerException}", ex.InnerException);
while (!RespWriteUtils.WriteError("ERR " + (ex.InnerException ?? ex).Message, ref dcurr, dend))
SendAndReset();
return true;
}
catch (Exception ex)
{
logger?.LogError(ex, "Error executing Lua script");
while (!RespWriteUtils.WriteError("ERR " + ex.Message, ref dcurr, dend))
SendAndReset();
return true;
}
return true;
}
I think the main issue is that much of the API documented at https://redis.io/docs/latest/develop/interact/programmability/lua-api/ is not yet implemented. It will probably require community involvement to add the functionality to bring Garnet Lua scripting to parity with Redis. Definitely not a couple of small PRs 😄
@danielchristianschroeter can you see if https://github.com/mmclure-msft/garnet/commit/b357a632174cad2eab3e74fd80bf7320608352d0 fixes your
EDIT: Never mind - there seems to be some issue with returning a table - this needs more investigation.status_reply
error?
@mmclure-msft Thank you for your assistance! Your suggestions were very helpful.
Log Output:
info: Session[0] [172.24.0.1:40662] [027B1BBF] Lua script executed successfully. Result: table
Even though the Lua script executes successfully, my code still fails with ErrNotObtained
and Unable to obtain lock: context deadline exceeded
. This happens because the result is returned as a table instead of a string, which causes compatibility issues with the original Redis-based code.
To fix this, in addition to your suggested changes, I modified the code in libs/server/Lua/LuaRunner.cs to ensure the Lua script returns a string, as required for compatibility with the original Redis code (referencing this line).
Here’s the quick and dirty test approach I used:
object RunInternal(string[] keys, string[] argv)
{
if (keys != this.keys)
{
if (keys == null)
{
this.keys = null;
sandbox_env["KEYS"] = this.keys;
}
else
{
if (this.keys != null && keys.Length == this.keys.Length)
Array.Copy(keys, this.keys, keys.Length);
else
{
this.keys = keys;
sandbox_env["KEYS"] = this.keys;
}
}
}
if (argv != this.argv)
{
if (argv == null)
{
this.argv = null;
sandbox_env["ARGV"] = this.argv;
}
else
{
if (this.argv != null && argv.Length == this.argv.Length)
Array.Copy(argv, this.argv, argv.Length);
else
{
this.argv = argv;
sandbox_env["ARGV"] = this.argv;
}
}
}
var result = function.Call();
return ProcessLuaResult(result);
}
object ProcessLuaResult(object[] result)
{
if (result.Length > 0)
{
var firstResult = result[0];
if (firstResult is LuaTable luaTable)
{
if (luaTable["ok"] != null)
{
return luaTable["ok"].ToString();
}
}
return firstResult;
}
return null;
}
/// <summary>
/// Runs the precompiled Lua function
/// </summary>
/// <returns></returns>
public object Run()
{
var result = function.Call();
return ProcessLuaResult(result);
}
Log Output After Changes:
info: Session[0] [172.29.0.1:60954] [037A9C05] Lua script executed successfully. Result: 1
After applying these changes, the code now executes successfully.
Interesting. Per https://redis.io/docs/latest/develop/interact/programmability/lua-api it's supposed to return a table - I'll need to experiment with real Redis and see what happens.
I see your point, but for some reason the same code is working fine with Redis v7.4.0.
With a version of your code I have this almost working the same as Redis for both redis.status_reply
and redis.error_reply
- the only difference is:
127.0.0.1:6379> EVAL "return { ok = 'Success' }" 0
Success
127.0.0.1:6379> EVAL "return { err = 'Failure' }" 0
(error) Failure
127.0.0.1:6379> EVAL "return redis.status_reply('Success')" 0
Success
127.0.0.1:6379> EVAL "return redis.error_reply('Failure')" 0
(error) Failure
in Redis vs.
172.24.80.1:6379> EVAL "return { ok = 'Success' }" 0
"Success"
(5.56s)
172.24.80.1:6379> EVAL "return { err = 'Failure' }" 0
(error) Failure
(4.28s)
172.24.80.1:6379> EVAL "return redis.status_reply('Success')" 0
"Success"
(3.62s)
172.24.80.1:6379> EVAL "return redis.error_reply('Failure')" 0
(error) Failure
(4.11s)
in Garnet (ignore the timings - that's just because I had breakpoints enabled). The difference is Success
vs. "Success"
and I haven't yet found where that's coming from and whether it's a problem or not.
Added PR #645 to handle this correctly.
Thank you! This looks good, and the blocking issue appears to be resolved.
However, I've encountered a new undesired behavior: I'm getting a "lock not held" error during the Release, although this doesn't cause any blocking behavior. I previously also set "LuaTransactionMode" to "false." From what I can tell, the lock seems to be acquired successfully. Even after several attempts, this behavior cannot be reproduced with the Redis v7.4.0. Unfortunately, I don't see any hint in the logs of Garnet.
Here’s an example code:
func (rs *RedisStorage) Unlock(ctx context.Context, name string) error {
key := rs.prefixLock(name)
// load and delete lock from sync.Map
if syncMapLock, loaded := rs.locks.LoadAndDelete(key); loaded {
// type assertion for Redis lock
if lock, ok := syncMapLock.(*redislock.Lock); ok {
// release the Redis lock
if err := lock.Release(ctx); err != nil {
return fmt.Errorf("Unable to release lock for %s: %v", key, err)
}
}
}
return nil
}
For reference, here’s the related function in redislock
:
func (l *Lock) Release(ctx context.Context) error {
I'm pretty sure the issue is with this line in release.lua
:
https://github.com/bsm/redislock/blob/bfd385b776d1e069eca5cb6b6cc41d635cbb4e55/release.lua#L5
because unpack
isn't implemented. There's still a lot of work to do to get Garnet up to parity with Redis in terms of LUA scripting. I'll chip away at some of it but it's not going to be a short-term thing.
Actually, unpack
should be implemented:
https://github.com/microsoft/garnet/blob/b92c3dc57e75e7fc88fa1891f81304e481f342e1/libs/server/Lua/LuaRunner.cs#L68
Will dig a little deeper.
For obtain.lua, I am hitting an index out of bounds issue with the ipairs
command. Are you not hitting this issue?
This seems to be an open issue in NLua, as per this: https://github.com/NLua/NLua/issues/472
This unit test works: https://github.com/microsoft/garnet/blob/badrishc/lua-more-fns/test/Garnet.test/LuaScriptTests.cs#L352
But if I replace "a" with "KEYS" which is populated with 1 based offsets, the ipairs fails. This is what obtain.lua does:
local setArgs = {}
for _, key in ipairs(KEYS) do
table.insert(setArgs, key)
table.insert(setArgs, ARGV[1])
end
I worked around the ipairs
issue by adding an extra null at the end of the KEYS string array in this PR: https://github.com/microsoft/garnet/pull/655 so that works, but unpack(KEYS)
does not work as it throws a NLua exception: attempt to get length of a System.String[]
It seems a more robust solution would be to make KEYS and ARGV LuaTable types rather than C# string arrays. That way, all Lua functions on them would be mostly guaranteed to work.
The linked PR https://github.com/microsoft/garnet/pull/672 should support most existing Lua scripts - let me know if you see any issues.
Thanks! Unfortunately, I still see the reported issue with "lock not held" from comment https://github.com/microsoft/garnet/issues/635#issuecomment-2333495398
Good news - this is now fixed in the latest commit to my linked PR. The example from https://github.com/bsm/redislock?tab=readme-ov-file#examples now works unmodified. If your code still does not work, please provide a stand-alone example similar to the redislock example I linked, so that I can run it directly using go run .
.
@badrishc, I wanted to follow up and let you know that I've tested the latest commit, and I'm happy to report that it's working very smoothly without any issues now! The improvements you've made to handle Lua scripting in Garnet are fantastic, and I'm confident that many users will appreciate the capability to seamlessly replace Lua Redis implementations with Garnet.
A huge thank you to both you and @mmclure-msft for your tremendous effort and time invested in resolving these issues. It’s wonderful to see the progress that's been made!
Let's close this issue if the PR is merged.
@badrishc Args, wait a minute. I tested now also https://github.com/bsm/redislock?tab=readme-ov-file#examples
It seems, this part is not working as expected:
// Sleep and check the remaining TTL.
time.Sleep(50 * time.Millisecond)
if ttl, err := lock.TTL(ctx); err != nil {
log.Fatalln(err)
} else if ttl > 0 {
fmt.Println("Yay, I still have my lock!")
}
Response with Garnet:
go run .
I have a lock!
Now, my lock has expired!
Response with Redis (Docker image redis:latest):
go run .
I have a lock!
Yay, I still have my lock!
Now, my lock has expired!
Update: It looks like Garnet is setting the TTL to zero immediately after obtaining the lock, which is not the expected behavior. This discrepancy could be due to differences in how Garnet handles TTL settings compared to Redis.
Slightly modified main.go with additional logging:
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/bsm/redislock"
"github.com/redis/go-redis/v9"
)
func main() {
start := time.Now()
// Connect to redis.
client := redis.NewClient(&redis.Options{
Network: "tcp",
Addr: "127.0.0.1:6380", // Update Port for Garnet/Redis Endpoint
})
defer client.Close()
fmt.Printf("Redis connection time: %v\n", time.Since(start))
// Create a new lock client.
locker := redislock.New(client)
fmt.Printf("Lock client creation time: %v\n", time.Since(start))
ctx := context.Background()
// Try to obtain lock.
lockStart := time.Now()
lock, err := locker.Obtain(ctx, "my-key", 300*time.Millisecond, nil) // Increased lock duration
if err == redislock.ErrNotObtained {
fmt.Println("Could not obtain lock!")
} else if err != nil {
log.Fatalln(err)
}
fmt.Printf("Lock obtain time: %v\n", time.Since(lockStart))
// Don't forget to defer Release.
defer lock.Release(ctx)
fmt.Println("I have a lock!")
// Sleep and check the remaining TTL.
time.Sleep(50 * time.Millisecond)
ttlStart := time.Now()
if ttl, err := lock.TTL(ctx); err != nil {
log.Fatalln(err)
} else if ttl > 0 {
fmt.Printf("Yay, I still have my lock! TTL: %v\n", ttl)
} else {
fmt.Println("Lock TTL is zero!")
}
fmt.Printf("TTL check time: %v\n", time.Since(ttlStart))
// Extend my lock.
refreshStart := time.Now()
if err := lock.Refresh(ctx, 300*time.Millisecond, nil); err != nil {
log.Fatalln(err)
}
fmt.Printf("Lock refresh time: %v\n", time.Since(refreshStart))
// Sleep a little longer, then check.
time.Sleep(100 * time.Millisecond)
expireCheckStart := time.Now()
if ttl, err := lock.TTL(ctx); err != nil {
log.Fatalln(err)
} else if ttl == 0 {
fmt.Println("Now, my lock has expired!")
} else {
fmt.Printf("Lock still valid! TTL: %v\n", ttl)
}
fmt.Printf("Final TTL check time: %v\n", time.Since(expireCheckStart))
fmt.Printf("Total execution time: %v\n", time.Since(start))
}
With Garnet:
go run .
Redis connection time: 0s
Lock client creation time: 0s
Lock obtain time: 4.149ms
I have a lock!
Lock TTL is zero!
TTL check time: 1.8605ms
Lock refresh time: 2.6271ms
Lock still valid! TTL: 198ms
Final TTL check time: 1.1119ms
Total execution time: 160.9905ms
With Redis:
go run .
Redis connection time: 0s
Lock client creation time: 0s
Lock obtain time: 2.6355ms
I have a lock!
Yay, I still have my lock! TTL: 249ms
TTL check time: 945µs
Lock refresh time: 502.7µs
Lock still valid! TTL: 198ms
Final TTL check time: 1.0737ms
Total execution time: 156.6583ms
Thank you for the simple clean repro! Fixed in the latest commit to the open PR. It was just a small corner case where we did not pass the extra SET parameters (such as expiration) from the script to the storage API.
Describe the bug
When attempting to execute Lua scripts using GarnetServer, a NullReferenceException is thrown in the ExecuteScript method. This issue occurs when using the EVAL or EVALSHA commands.
A NullReferenceException is thrown, indicating that an object reference is not set to an instance of an object.
Error Message:
The LuaRunner object might not be properly initialized or retrieved, leading to a null reference when attempting to execute the script.
This report is related to the Lua implementation in https://github.com/microsoft/garnet/pull/550
Steps to reproduce the bug
Enable Lua scripting in GarnetServer.
Attempt to execute a Lua script using the EVAL or EVALSHA command. Go Code Using RedisLock from https://github.com/bsm/redislock/blob/main/redislock.go:
Related Lua scripts: https://github.com/bsm/redislock/blob/main/obtain.lua https://github.com/bsm/redislock/blob/main/refresh.lua
Observe the NullReferenceException in the logs of garnet server.
Expected behavior
The Lua script should execute without throwing a NullReferenceException.
Screenshots
No response
Release version
v1.0.19
IDE
No response
OS version
No response
Additional context
Docker Compose service configuration: docker-compose.yml
garnet.conf