rundel / parsermd

https://rundel.github.io/parsermd/
Other
76 stars 4 forks source link

Preserve special characters when parsing #31

Closed xuansontrinh closed 1 year ago

xuansontrinh commented 1 year ago

Hello, Thank you for writing this parser, it is extremely helpful.

However, when using the parser, I am running into an issue that causes a few special characters in my file to be unreadable.

Here is an example:

This is before parsing

## Which statements are true?
- [ ] The idea of Gradient Descent (GD) is to iteratively go from the current candidate θ[t] in the direction of the positive gradient, with learning rate α to the next θ[t+1].
- [x] Empirical risk minimization (ERM) leads to finding the model with the lowest average loss (in the absence of regularization).
- [ ] A learner outputs the best parameters and hyperparameters.
- [ ] Supervised ML is always about learning to predict, and never about learning to explain.

This is after parsing

## Which statements are true?
- [ ] The idea of Gradient Descent (GD) is to iteratively go from the current candidate θ[t] in the direction of the positive gradient, with learning rate α to the next θ[t+1].
- [x] Empirical risk minimization (ERM) leads to finding the model with the lowest average loss (in the absence of regularization).
- [ ] A learner outputs the best parameters and hyperparameters.
- [ ] Supervised ML is always about learning to predict, and never about learning to explain.

As you can see, the theta symbol that I used turned into something unreadable. Is there a way that helps me mitigate this issue?

Thank you very much!

P/s: I am using R version 4.2.0 (2022-04-22 ucrt) on Windows.

xuansontrinh commented 1 year ago

Seems like my problem is about the encoding with Windows rather than about this specific package. Here is a working solution for me: https://blog.r-project.org/2020/05/02/utf-8-support-on-windows/

You can also use Rstudio to let it take care of this for you.

Hope it helps.

rundel commented 1 year ago

Thanks for the report - it sounded like a UTF encoding issue. There may be something we can do on the package end to ensure what goes through the parsing engine is maintained as utf-8 at a minimum.