Open jlewi opened 2 weeks ago
Are the logs Processing Any Log File? https://github.com/jlewi/foyle/blob/4288e91ac805b46103d94230b32dd1bc2f957095/app/pkg/analyze/analyzer.go#L200
That is logged in debug mode so we'd have to bump the log level.
@sourishkrout can you try the following
Run Foyle with debug logs enabled
foyle serve --log=debug
Execute a couple cells / accept a suggestion
Search the logs for the above message
jq -c 'select(.msg == "Processing log file")' "/Users/jlewi/Library/Application Support/runme/logs/logs.2024-08-19T19:45:23.json"
The Analyzer is the component that processes the logs. It produces traces which are then what we learn from. The message in discord showing no "Build block log" indicated no logs for blocks were being produced. So now we are trying to verify that the Analyzer is processing logs so we can narrow down the problem.
Negative. I don't see any Processing log file
logs:
https://gist.github.com/sourishkrout/bc7cdccaa828056b3f5c622679ac49bb
By the way, ☝️ would make a good troubleshooting notebook inside the foyle
repo.
@jlewi I am not sure what this logic does, but it appears my logs always wind up in this condition (see below):
Oh, looks like that's just tracking "savepoints" to avoid double processing.
3. Search the logs for the above message
jq -c 'select(.msg == "Processing log file")' "/Users/jlewi/Library/Application Support/runme/logs/logs.2024-08-19T19:45:23.json"
Q: Processing log file
should appear in Foyle's log and not in the Runme AI's logs, no?
Here we go. Logs! Still not learning, though.
https://gist.github.com/sourishkrout/3cdecee8b123a436d35775cfbf4b8dac
@sourishkrout. Your 100% correct it should be the Foyle Logs not the RunMe logs. I should have caught that.
By the way, ☝️ would make a good troubleshooting notebook inside the foyle repo.
That's correct. I should have done that in the first place rather than copying and pasting buggy queries.
Here you go.
Its a bit rough but try running through it and then sharing the output.
Here are my results @jlewi:
https://gist.github.com/sourishkrout/0f77867bb35df94d1b0eb4490c8fc6ff
Thanks @sourishkrout
It looks like you set LASTLOG
to the RunMe logs here
So all log statements below that are checking the wrong log file.
Does the formula in the notebook not work for you?
export LASTLOG=~/.foyle/logs/raw/$(ls -t ~/.foyle/logs/raw | head -n 1 )
Your earlier trace indicates that a BlockLog was properly computed for your cell so that's good.
In this particular case I wouldn't expect learning to occur. If you look at the BlockLog you can see that contents of the generatedBlock and the executedBlock are the same. The only difference is some whitespace.
You can see here that when processing examples (BlockLogs) we strip whitespace and then compare the contents. If they are the same then we don't learn from that example.
So here's the things to try
Okay, so now again with Foyle & Runme on latest main
:
Changed the troubleshooting notebook slightly also to include a full trace of my most recent log. Is there anything you can spot in the gist, @jlewi?
@sourishkrout so thanks to your report I was able to identify two bugs that I fixed in #233
The first is a bug in the documentation; the command to check the blocklogs has changed because we now use the ConnectProtocol
The second is there was a bug in adding execution events to the blocklog
So please retry using the latest on main which has the fix. The updated command for checking the blocklog is here https://foyle.io/docs/learning/troubleshoot_learning/#did-block-logs-get-created
Your logs indicate the BlockLog is being generated
{"severity":"info","time":1726158688.428434,"caller":"analyze/analyzer.go:658","function":"github.com/jlewi/foyle/app/pkg/analyze.buildBlockLog","message":"Building block log","blockId":"01J7KH6B5YJ111KEXAAHD97XY8","block":{"id":"01J7KH6B5YJ111KEXAAHD97XY8","gen_trace_id":"da7a66391bad2f64f6f66e056eed6772","exit_code":0,"eval_mode":false,"resource_version":"c7db69f7-869b-4439-92e2-09ba254bc364","suggestion_status":"SuggestionStatusUnknown"}}
So I suspect the blocklog is there so once we start using the updated CURL command we should be able to fetch it and see what's going on.
Note that the location of the troubleshooting guide has changed since I moved it into the docs website.
- The updated command is here
Where? Did you forget to include the link?
Here's an updated Runme session gist: https://gist.github.com/sourishkrout/babd0e9f32ad8079869d4534f8b3bb4b#file-troubleshoot_learning-01j7mayp580arr60njbes5fbq6-md
Oh sorry. I did forget the link; https://foyle.io/docs/learning/troubleshoot_learning/#did-block-logs-get-created
Oh great looks like you found the correct link sorry about that.
It looks like things are working now.
So it looks like #233 fixed the issue with example.binpb files not being computed.
Are you observing something that leads you to think learning isn't working? Are you just not observing the quality you'd expect?
Sorry the example showing up in the output isn't from my machine. The cell is echoing the static content from the original doc since it's plain text. Should have skipped executing it.
The issue I'm observing is still the same where the previously accepted, corrected, and successfully executed does not seem to get retained as part of learning.
Ahh my mistake.
So it looks like cell 01J7KQPBYCT9VM2KFBY48JC7J0
is your cell?
jq
So there are a couple possibilities
Blocks get enqueued here by invoking the blockNotifier. That will not happen if the blockNotifier is nil.
Block notifier gets set here
Analyzer.Run is called here.
I can't see any reason why Learner.Enqueue wouldn't be getting invoked.
At this point the best way to probably debug this would be to set a couple breakpoints and see what code paths are getting executed. My suggestion would be to start as follows
Set a breakpoint here (https://github.com/jlewi/foyle/blob/8c57b00bc2b104e387b51e8032919d6ea90d865e/app/pkg/analyze/analyzer.go#L642
Set a breakpoint in learner.Reconcile https://github.com/jlewi/foyle/blob/8c57b00bc2b104e387b51e8032919d6ea90d865e/app/pkg/learn/learner.go#L124
Let me know if you want to pair.
@sourishkrout I found and fixed another bug #241 due to the logging changes. Fingers crossed that fixes things for you.
🤞 testing this now.
No dice. However, the additional cells in the troubleshooting guide might be helpful, @jlewi.
I'll try the breakpoints and see what happens. I could pair tmrw afternoon if that's easier.
It looks like I never get past this line, even though one of the two occasions was definitely a generated block:
https://github.com/jlewi/foyle/blob/main/app/pkg/learn/learner.go#L158
I'm sorry this is so frustrating. I really appreciate your patience.
It looks like I never get past this line, even though one of the two occasions was definitely a generated block:
Yes.
I think I know what is going on. "learner_enqueued_total" is only 1.
In #241 I changed when we notify the learner because we should only need to notify the learner after processing the executed log entry because that should necessarily be after the generated log entry.
The learner gets notified here. My assumption was that by the time the execution event gets processed the BlockLog has already been processed
But that's not guaranteed because we read the log lines in chunks and only update the blocklog after reading the entire chunk https://github.com/jlewi/foyle/blob/6035823bd821512769415825b4cccba2970ab092/app/pkg/analyze/analyzer.go#L307
So here are some examples that were produced for me
-rw-r--r-- 1 jlewi staff 30077 Sep 16 18:24 01J7YMFGD47MEG6QN46NN0GXKQ.example.binpb
-rw-r--r-- 1 jlewi staff 33214 Sep 16 18:25 01J7YQ57WRP73B0ND35NZY3MVG.example.binpb
-rw-r--r-- 1 jlewi staff 7953 Sep 17 14:34 01J80YESW7NYV5N6DX0168HNAQ.example.binpb
Two things are true in my case
I think those two things contribute to the bug being masked. In particular, the bug means the first time the learner gets notified due to the first execution event the generated block won't be part of the log. The subsequent cell executions lead to additional learner notifications being fired. Debug logs increase the number of logs in between the two execution events. As a result, were more likely to hit the chunk limit in between execution events and therefore combine the log entries and update the BlockLog with the generated block.
@sourishkrout Latest fix is #244
@sourishkrout Latest fix is #244
I can definitely see more activity in the logs now. However, my ~/.foyle/training
remains empty, and learning still does not take effect. I made sure that in the example below, I would edit the cell a few times to achieve the desired outcome.
If you're available tomorrow afternoon, we could pair up to work through it. Looking over my shoulder might help, too.
So we're making progress. I think I found the next bug.
Your counters show that there were two learning examples.
learner_blocks_processed{status="learn"} 2
learner_blocks_processed{status="nochange"} 1
learner_blocks_processed{status="notgenerated"} 1
learner_blocks_processed{status="unexecuted"} 10
So we know the code reached here https://github.com/jlewi/foyle/blob/d67a104e1263c243ad6271d5329d154626751576/app/pkg/learn/learner.go#L176
If something went wrong in the subsequent processing we should have seen an error.
I think the bug is that exampleFiles is empty then the for loop never iterates and we don't try to create any exampleFiles https://github.com/jlewi/foyle/blob/d67a104e1263c243ad6271d5329d154626751576/app/pkg/learn/learner.go#L177
getExampleFiles will return an empty list if getTrainingDirs is empty https://github.com/jlewi/foyle/blob/d67a104e1263c243ad6271d5329d154626751576/app/pkg/learn/learner.go#L252C29-L252C44
getTrainingDirs returns empty if Learner in your configuration is empty https://github.com/jlewi/foyle/blob/d67a104e1263c243ad6271d5329d154626751576/app/pkg/config/config.go#L247
This is legacy code. Prior to #211 you had to set the location of the RunMe logs and learner could be nil. Post #211 you shouldn't need to set learner.
My config looks like
learner:
logDirs: []
Fix incoming.
See this thread. https://discord.com/channels/1102639988832735374/1278048402567069728
User queried the logs and didn't see any entries for BuildLog.