mischov / meeseeks_html5ever

Meeseeks-specific NIF binding of html5ever using Rustler.
Apache License 2.0
10 stars 15 forks source link

Unable to get MeeseeksHtml5ever.Native.parse_html/1 to run #33

Closed megalithic closed 4 years ago

megalithic commented 4 years ago

Elixir 1.9/OTP 0.21 Meeseeks 0.13.1 Meeseeks_html5ever 0.12.1

I continually get this when trying to parse html with Meeseeks.parse() from my calling module:

(UndefinedFunctionError) function MeeseeksHtml5ever.Native.parse_html/1 is undefined (module MeeseeksHtml5ever.Native is not available)
    (meeseeks_html5ever) MeeseeksHtml5ever.Native.parse_html("<!DOCTYPE html>\n<!-- Created by pdf2htmlEX (https://github.com/coolwanglu/pdf2htmlex) -->\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n<head>\n<meta charset=\"utf-8\"/>\n<meta name=\"generator\" content=\"pdf2htmlEX\"/>\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\"/>\n<style type=\"text/css\">\n/*! \n * Base CSS for pdf2htmlEX\n * Copyright 2012,2013 Lu Wang <coolwanglu@gmail.com> \n * https://github.com/coolwanglu/pdf2htmlEX/blob/master/share/LICENSE\n */#sidebar{position:absolute;top:0;left:0;bottom:0;width:250px;padding:0;margin:0;overflow:auto}#page-container{position:absolute;top:0;left:0;margin:0;padding:0;border:0}@media screen{#sidebar.opened+#page-container{left:250px}#page-container{bottom:0;right:0;overflow:auto}.loading-indicator{display:none}.loading-indicator.active{display:block;position:absolute;width:64px;height:64px;top:50%;left:50%;margin-top:-32px;margin-left:-32px}.loading-indicator img{position:absolute;top:0;left:0;bottom:0;right:0}}@media print{@page{margin:0}html{margin:0}body{margin:0;-webkit-print-color-adjust:exact}#sidebar{display:none}#page-container{width:auto;height:auto;overflow:visible;background-color:transparent}.d{display:none}}.pf{position:relative;background-color:white;overflow:hidden;margin:0;border:0}.pc{position:absolute;border:0;padding:0;margin:0;top:0;left:0;width:100%;height:100%;overflow:hidden;display:block;transform-origin:0 0;-ms-transform-origin:0 0;-webkit-transform-origin:0 0}.pc.opened{display:block}.bf{position:absolute;border:0;margin:0;top:0;bottom:0;width:100%;height:100%;-ms-user-select:none;-moz-user-select:none;-webkit-user-select:none;user-select:none}.bi{position:absolute;border:0;margin:0;-ms-user-select:none;-moz-user-select:none;-webkit-user-select:none;user-select:none}@media print{.pf{margin:0;box-shadow:none;page-break-after:always;page-break-inside:avoid}@-moz-document url-prefix(){.pf{overflow:visible;border:1px solid #fff}.pc{overflow:visible}}}.c{position:absolute;border:0;padding:0;margin:0;overflow:hidden;display:block}.t{position:absolute;white-space:pre;font-size:1px;transform-origin:0 100%;-ms-transform-origin:0 100%;-webkit-transform-origin:0 100%;unicode-bidi:bidi-override;-moz-font-feature-settings:\"liga\" 0}.t:after{content:''}.t:before{content:'';display:inline-block}.t span{position:relative;unicode-bidi:bidi-override}._{display:inline-block;color:transparent;z-index:-1}::selection{background:rgba(127,255,255,0.4)}::-moz-selection{background:rgba(127,255,255,0.4)}.pi{display:none}.d{position:absolute;transform-origin:0 100%;-ms-transform-origin:0 100%;-webkit-transform-origin:0 100%}.it{border:0;background-color:rgba(255,255,255,0.0)}.ir:hover{cursor:pointer}</style>\n<style type=\"text/css\">\n/*! \n * Fancy styles for pdf2htmlEX\n * Copyright 2012,2013 Lu Wang <coolwanglu@gmail.com> \n * https://github.com/coolwanglu/pdf2htmlEX/blob/master/share/LICENSE\n */@keyframes fadein{from{opacity:0}to{opacity:1}}@-webkit-keyframes fadein{from{opacity:0}to{opacity:1}}@keyframes swing{0{transform:rotate(0)}10%{transform:rotate(0)}90%{transform:rotate(720deg)}100%{transform:rotate(720deg)}}@-webkit-keyframes swing{0{-webkit-transform:rotate(0)}10%{-webkit-transform:rotate(0)}90%{-webkit-transform:rotate(720deg)}100%{-webkit-transform:rotate(720deg)}}@media screen{#sidebar{background-color:#2f3236;background-image:url(\"data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0IiBoZWlnaHQ9IjQiPgo8cmVjdCB3aWR0aD0iNCIgaGVpZ2h0PSI0IiBmaWxsPSIjNDAzYzNmIj48L3JlY3Q+CjxwYXRoIGQ9Ik0wIDBMNCA0Wk00IDBMMCA0WiIgc3Ryb2tlLXdpZHRoPSIxIiBzdHJva2U9IiMxZTI5MmQiPjwvcGF0aD4KPC9zdmc+\")}#outline{font-family:Georgia,Times,\"Times New Roman\",serif;font-size:13px;margin:2em 1em}#outline ul{padding:0}#outline li{list-style-type:none;margin:1em 0}#outline li>ul{margin-left:1em}#outline a,#outline a:visited,#outline a:hover,#outline a:active{line-height:1.2;color:#e8e8e8;text-overflow:ellipsis;white-space:nowrap;text-decoration:none;display:block;overflow:hidden;outline:0}#outline a:hover{color:#0cf}#page-container{background-color:#9e9e9e;background-image:" <> ...)
    (meeseeks_html5ever) lib/meeseeks_html5ever.ex:12: MeeseeksHtml5ever.parse_html/1
    (meeseeks) lib/meeseeks/parser.ex:15: Meeseeks.Parser.parse/1
    (junior) lib/junior.ex:159: anonymous fn/1 in Junior.parse/2
    (progress_bar) lib/progress_bar/spinner.ex:30: ProgressBar.Spinner.render/2
    (junior) lib/junior.ex:19: Junior.start/1
    (elixir) lib/kernel/cli.ex:121: anonymous fn/3 in Kernel.CLI.exec_fun/2

I was able to successfully parse with floki + html5ever but would prefer to use meeseeks. Any suggestions/thoughts?

mischov commented 4 years ago

@megalithic That's an odd error. That function should be defined even if the NIF portion wasn't loaded correctly, so it makes me think something in either your build process or your environment is the problem.

How you tried cleaning your deps and _build and rebuilding?

megalithic commented 4 years ago

Hey, thanks for chiming in @mischov!

I indeed have. Multiple times. Does my config/config.exs or applications def in mix.exs need anything pertaining to Meeseeks (other than just having the deps int he my deps definition)?

Yeah, it's very strange. Everything compiles fine. I am using this with escript, if that has any bearing. :/

mischov commented 4 years ago

@megalithic Ah, it does. https://github.com/mischov/meeseeks/issues/23

It's my understanding that Rustler (and NIFs in general) won't work with escript, which should in turn make it impossible to html5ever to have worked with Floki so I am a bit curious how it did. Maybe it was just falling back to the mochiweb parser?

megalithic commented 4 years ago

Well fooey.. Not a big deal, I can run it from iex locally, but would like to have been able to distribute this for a quick CLI tool. Any thoughts around how this could be achieved? I don't expect you to come up with a solution, but if you had heard/seen anyone work around this and continue to use Meeseeks, that'd be fantastic! I'm making things work with Floki, it's just a bit fiddly with their own html_tree struct and i'd rather just deal with Enums of found html nodes. Anyway, thanks again!

mischov commented 4 years ago

I haven't seen any workarounds for Meeseeks in particular, but it would probably be creating a release and calling into commands for it with a shell script. Here is an old Distillery issue describing that- I don't know how that would be different in current Distillery or mix releases but that'd be the route I'd go down.

Alternatively you could go to the source and do it in Rust based on html5ever and some selection library and just distribute a binary.

mischov commented 4 years ago

@megalithic I'm going to go ahead and close this, but go ahead and re-open if you have more questions, and please feel free to follow up with whatever solution you land on for future people who find themselves with this problem.