Closed thyrymn closed 9 years ago
Saving WebServer to [ /root/.cartero/templates/webserver ]
Cloning URL https://www.blahblah.com
unknown encoding name - text/html
/root/Cartero/lib/cartero/commands/cloner.rb:220:in force_encoding' /root/Cartero/lib/cartero/commands/cloner.rb:220:in
create_index'
/root/Cartero/lib/cartero/commands/cloner.rb:146:in clone' /root/Cartero/lib/cartero/commands/cloner.rb:108:in
run'
/root/Cartero/lib/cartero/command.rb:82:in block in method_added' /root/Cartero/lib/cartero/cli.rb:190:in
block in run'
/root/Cartero/lib/cartero/cli.rb:184:in each' /root/Cartero/lib/cartero/cli.rb:184:in
run'
./cartero:52:in `
Ok now I get the problem. I'll fix it asap. and push a commit. Working on a better implementation of how I get the content_type for a specific site.
want another one? different site:
Cloning URL https://www.blahblah.com
bad URI(is not URI?): Email: myname at domain dot com
/usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/uri/rfc3986_parser.rb:66:in split' /usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/uri/rfc3986_parser.rb:72:in
parse'
/usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/uri/common.rb:226:in parse' /root/Cartero/lib/cartero/commands/cloner.rb:177:in
block in proccess_urls'
/usr/local/rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:187:in block in each' /usr/local/rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:186:in
upto'
/usr/local/rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:186:in each' /root/Cartero/lib/cartero/commands/cloner.rb:176:in
proccess_urls'
/root/Cartero/lib/cartero/commands/cloner.rb:194:in create_index' /root/Cartero/lib/cartero/commands/cloner.rb:146:in
clone'
/root/Cartero/lib/cartero/commands/cloner.rb:108:in run' /root/Cartero/lib/cartero/command.rb:82:in
block in method_added'
/root/Cartero/lib/cartero/cli.rb:190:in block in run' /root/Cartero/lib/cartero/cli.rb:184:in
each'
/root/Cartero/lib/cartero/cli.rb:184:in run' ./cartero:52:in
@thyrymn can you please test it using the latest available commit ? Let me know if that fixes your latest
As for the latest comment you trying to clone an email ?
no. it is my person domain. no email. testing now.
I asked because of the "bad URI(is not URI?): Email: myname at domain dot com" it look like an error in one of the underlying gems ... Could you check in your source of that if you have an url like email:test@test.com instead of mailto:test@test.com ? It is complaining and breaking out of something the RFC can't handle and if I can fix it I will happily fix it. Cloner needs to handle crazy amount of things, so the more sites we clone the more issues I can fix, if anything shows up.
I was at your thotcon talk, fwiw.
Issue 1: Site sort of mirrors. All text, no pictures. Looks like a gopher page. Error is gone. Issue 2: Is my resume. The thing it is hitting is a href:
<div id="contactDetails" class="quickFade delayFour">
<ul>
<a href="Email: myname at domain dot com" target="_blank">myname at domain dot com</a></li>
weird. It should be cloning the website w/out any issues. It should look like
cartero Cloner -U https://www.gmail.com -p /tmp -W gmail
cartero Listener -W /tmp/gmail -p 9090
I will do some researching into the second one. That is an interesting issue. I guess I am being too smart on my builder and I try to edit links too much. I guess I can try to ignore these when they fail and leave them as they are.
Yea, I've cloned a bunch of sites that work right, I'm trying to find one that works like my first one.
I moved issue #2 to www.spiritualdictator.com so you can try it.
It looks like sites that have heavy use of javascript have problem #1.
Thanks. Weird about the javascript. Issue number 1, was just an encoding issues. I am now allowing the underlying gem to determine encoding. It should not be related to javascript, but then again the internet is a weird world. If you have examples of sites that do not work. I will gladly add them to my testing and try to find the root of the issue.
In any case, I am still working on issue #2, but thanks for the feedback. The tool is only as good as people use it and report back to it so I can make it even more awesome. I really appreciate it.
OK yet another commit. Please check. I used your site and I got it to render correctly. Also I am not coughing errors for unknown URIs and leaving them as they are. This not perfect, and there is still a know case in which it might not work, but as long as you are using ruby > 2.2.0 you should be ok. I'll eventually moving to it.
Feel Free to reopen the issue or create a new one if you find anything else.
Issue 1 appear fully fixed in the release this am.
When I try to clone a site, with both the stable and unstable branch, I get an error about the type of "text/html" is not supportd.
When I try to use the --wget option, it doesn't work because to mirror the site with wget I need to use the robots off command to allow wget to ignore the robots. The wget options ignores the .wgetrc file.
Any ideas?