ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
559 stars 72 forks source link

Get main test-suite running on Windows #773

Open RichardChukwu opened 3 weeks ago

RichardChukwu commented 3 weeks ago

Issue: #761

This PR modifies the Eio test suite to run on Windows by removing the conditional enabled_if (<> %{os_type} "Win32" in tests/dune.

After enabling the test suite on Windows, these were some of my observations:

test fails

SUGGESTION

I think we can consider increasing the stack size to see if that will completely resolve all failures with the stack overflow ouput

I look forward to your thoughts on this @talex5 @patricoferris

talex5 commented 3 weeks ago

The line numbers reported by MDX don't exist and are very large. MDX is probably stuck in a parsing loop.

It would be good to shrink the md file to get a minimal test-case that demonstrates the problem and report it at https://github.com/realworldocaml/mdx/issues.

Thanks!

RichardChukwu commented 3 weeks ago

Ok, just for clarification please @talex5 when you say shrink the file?, do you mean probably running a specific test file instead, since the lines captured in the files are non existent?

talex5 commented 3 weeks ago

I mean that the fs.md file is very large. Try deleting the second half of it and see if it still fails. Keep simplifying it until it's obvious what the problem is, or you can't simplify it any further.

RichardChukwu commented 3 weeks ago

Ok, got it, let me work on that @talex5

RichardChukwu commented 3 weeks ago

I mean that the fs.md file is very large. Try deleting the second half of it and see if it still fails. Keep simplifying it until it's obvious what the problem is, or you can't simplify it any further.

Hi @talex5

I've reduced the fs.md file from 1000 to 500 to 100 to 50 and to just 10 lines, but unfortunately, I'm still encountering the same stack overflow error. Given that such a significant reduction didn’t affect the error, it seems likely that the issue isn’t specifically due to the file's size.

Would you recommend further steps to debug this, or should I proceed to report this as a possible parsing issue with MDX?

Thanks for the guidance!

image

talex5 commented 3 weeks ago

I've reduced the fs.md file from 1000 to 500 to 100 to 50 and to just 10 lines, but unfortunately, I'm still encountering the same stack overflow error.

That's great news! It's much easier to debug a problem in a 10 line file.

Can you make it even shorter? Can you still get it to stack overflow if the file doesn't use Eio at all? And then see if you can get the crash by running ocaml-mdx on it manually, outside of Eio.

RichardChukwu commented 3 weeks ago

I've reduced the fs.md file from 1000 to 500 to 100 to 50 and to just 10 lines, but unfortunately, I'm still encountering the same stack overflow error.

That's great news! It's much easier to debug a problem in a 10 line file.

Can you make it even shorter? Can you still get it to stack overflow if the file doesn't use Eio at all? And then see if you can get the crash by running ocaml-mdx on it manually, outside of Eio.

Ok. So in essence, aim to isolate the problem without any dependency on Eio right?

talex5 commented 3 weeks ago

Ok. So in essence, aim to isolate the problem without any dependency on Eio right?

Yes, if possible. Then we know the problem isn't in Eio and must be in MDX.

RichardChukwu commented 3 weeks ago

Ok. So in essence, aim to isolate the problem without any dependency on Eio right?

Yes, if possible. Then we know the problem isn't in Eio and must be in MDX.

Ok, on it, will give feedback soon

RichardChukwu commented 3 weeks ago

I've reduced the fs.md file from 1000 to 500 to 100 to 50 and to just 10 lines, but unfortunately, I'm still encountering the same stack overflow error.

That's great news! It's much easier to debug a problem in a 10 line file.

Can you make it even shorter? Can you still get it to stack overflow if the file doesn't use Eio at all? And then see if you can get the crash by running ocaml-mdx on it manually, outside of Eio.

@talex5 So with response to this, after reducing fs.md further, I’m still seeing the stack overflow error even with a minimal setup. The remaining code is down to just the#require line and Unix.umask command, as shown here:

image

I think this suggests that ocaml-MDX may be running into parsing issues, even with a very basic setup.

Please let me know if you’d like me to proceed further in a specific way. Thanks again for the guidance!

patricoferris commented 3 weeks ago

That doesn't look like a valid Markdown file, the code block has no closing backticks (i.e. ```). I don't think MDX should stack overflow in such a case, however, it doesn't seem to be blocking this particular piece of work maybe ?

RichardChukwu commented 3 weeks ago

That doesn't look like a valid Markdown file, the code block has no closing backticks (i.e. ```). I don't think MDX should stack overflow in such a case, however, it doesn't seem to be blocking this particular piece of work maybe ?

@patricoferris ooh, I must have missed that closing back ticks, but I'm not sure that's the issue though, but I'll try it however.

I did try running a test using various blocks within the markdown file till it got reduced to this as previously stated by @talex5

The stack overflow keeps occurirng on several tests occasions

RichardChukwu commented 3 weeks ago

Oh wow, just discovered something now. I deleted and reinstalled the opam package manager and started running the various code blocks of fs.md test all over again. All tests are now working perfectly fine with MDX.

Tried lines 1 to 142 so far with no stack overflow

How do I proceed from here? Should I keep testing till I get to the end of the line for the markdown file for proper verification? @talex5 @patricoferris

talex5 commented 3 weeks ago

That doesn't look like a valid Markdown file, the code block has no closing backticks (i.e. ```). I don't think MDX should stack overflow in such a case, however, it doesn't seem to be blocking this particular piece of work maybe ?

Ah, yes. Looks like I reported that back in 2023: https://github.com/realworldocaml/mdx/issues/420 Would be nice to get that fixed indeed.

I deleted and reinstalled the opam package manager and started running the various code blocks of fs.md test all over again. All tests are now working perfectly fine with MDX.

I don't see why opam would have any effect on that. Did you perhaps also reinstall Git with a different line-endings setting? e.g. https://github.com/realworldocaml/mdx/pull/294#issuecomment-749664997

Or maybe you're using a different version of MDX. Did you keep a copy of the old installation? Would be good to compare if so.

In any case, the tests are still failing in the CI: https://github.com/ocaml-multicore/eio/actions/runs/11529311468/job/32119632533?pr=773#step:6:32

RichardChukwu commented 3 weeks ago

I don't see why opam would have any effect on that. Did you perhaps also reinstall Git with a different line-endings setting? e.g. https://github.com/realworldocaml/mdx/pull/294#issuecomment-749664997

Or maybe you're using a different version of MDX. Did you keep a copy of the old installation? Would be good to compare if so.

In any case, the tests are still failing in the CI: https://github.com/ocaml-multicore/eio/actions/runs/11529311468/job/32119632533?pr=773#step:6:32

No, I didn't keep a copy. Yeah the CI tests still fail because I'm yet to push the changes to this branch.

I don't know but I'm really noticing some issues that seem not to be adding up @talex5 dunno if it's a clone problem though.

I noticed any markdown file I edit after cloning (by deleting the code and using the code from the parent repo, a simple copy and paste again) runs successfully when tested.

But any markdown code on my clone that I run without editing hangs. The difference I noticed between the two files are the backtics at the end of a block.

And if this is the case, it would mean rewriting all the tests in my cloned repo before pushing to commit.

RichardChukwu commented 3 weeks ago

@talex5 @patricoferris Now every other test seem to be failing the CI, and I really don't know what the issue is exactly, tried different approaches already but still fails. Would appreciate some guidance please

Thank you

talex5 commented 3 weeks ago

The CI says \ No newline at end of file. Maybe you removed all the newlines?

RichardChukwu commented 3 weeks ago

The CI says \ No newline at end of file. Maybe you removed all the newlines?

Hmmm... just checked. Thank you so much for drawing my attention to that

So apparently I did, let me edit and re-commit

RichardChukwu commented 3 weeks ago

ocaml-ci still fails @talex5 @patricoferris , the reason does not seem to be clear to me in the logs either, please kindly look into it

talex5 commented 3 weeks ago

Seems pretty clear:

[mdx] Fatal error: File "trace.md", lines 44761057-44761058: Stack overflow

This is the problem you're trying to fix.

RichardChukwu commented 2 weeks ago

Hi @talex5

I’ve worked through several iterations with the test files to isolate the root cause of the stack overflow and test hanging issues:

  1. Reduced file Content: I systematically reduced the file content to a block. Even with minimal code, the CI tests still hang at 99% and then fails with output stack overflow

  2. Direct mdx Execution: Running eg. ocaml-mdx trace.md directly, even in this form, results still end in stack overflow.

I think these findings suggest that the issue might be due to a parsing loop or compatibility issue with mdx.

I would appreciate any guidance on further isolating the issue.

Thank you!

RichardChukwu commented 2 weeks ago

Based on these failures, should I go ahead and report the issue at https://github.com/realworldocaml/mdx/issues @talex5 @patricoferris

talex5 commented 2 weeks ago

Running eg. ocaml-mdx trace.md directly, even in this form, results still end in stack overflow.

That's very good! Does the test use anything from Eio? If you can get it to fail just using the OCaml stdlib that would be even better. This avoids any possible confusion about which project (Eio or MDX) is causing the bug.

RichardChukwu commented 2 weeks ago

Running eg. ocaml-mdx trace.md directly, even in this form, results still end in stack overflow.

That's very good! Does the test use anything from Eio? If you can get it to fail just using the OCaml stdlib that would be even better. This avoids any possible confusion about which project (Eio or MDX) is causing the bug.

Hi @talex5

So as suggested, I tested the trace.md file and other markdown files too both with and without Eio dependencies. Here’s what I found:

  1. Without Eio: Running trace.md without any Eio dependencies using ocaml-mdx test trace.md completed successfully without any stack overflow or hanging issues.

  2. With Eio: Including Eio dependencies in the file still results in the stack overflow, as we initially observed.

  3. Testing the Code in the OCaml REPL: I also tested the above code in the OCaml REPL to verify its functionality. The output was as expected, confirming that it runs correctly without any Eio dependencies.

Based on these findings, it seems the bug might indeed be related to Eio rather than mdx.

image

talex5 commented 2 weeks ago

OK, then could you find out which Eio library causes the problem.

e.g. what happens if you just require #require "eio_main" and nothing else? What if you only ask for eio_windows or eio or eio.core?

If it still fails for eio.core, try with eio.core's dependencies (see lib_eio/core/dune).

Keep trying to make the failing and succeeding cases more similar until you find the cause of the difference.