ocaml / ocaml.org

The official OCaml website.
https://ocaml.org
Other
147 stars 296 forks source link

Cookbook Extract Links from HTML #2552

Open ggsmith842 opened 1 week ago

ggsmith842 commented 1 week ago

Adds an example for the extract-links-from-html task using the Re library. A sample html string is provided and the usage example shows how to read an HTML and then print the links found.

ggsmith842 commented 1 week ago

@cuihtlauac and @christinerose thank you for the feedback! I simplified the comments section so hopefully it makes it more concise and appropriate for the recipe.

yawaramin commented 1 day ago

I'd advise against trying to teach people to parse HTML with regular expressions :-) https://stackoverflow.com/a/1732454/20371

Maybe we can recommend Anton Bachin's excellent lambdasoup package? https://ocaml.org/p/lambdasoup/

ggsmith842 commented 1 day ago

@yawaramin let me look into this and make some changes. Thank you for the suggestion. I wasn't aware of how regex isn't suited for working with html.

yawaramin commented 1 day ago

No worries. Btw it's the second example shown on the docs page: https://aantron.github.io/lambdasoup/