navnorth / wp-content-mirror

WordPress plugin for mirroring page external page content
2 stars 0 forks source link

error in URL replace - possibly due to multiple #14

Open joehobson opened 9 years ago

joehobson commented 9 years ago

The plugin seems to have problems processing this page: http://www2.ed.gov/programs/skillssuccess/awards.html, which is mirrored on our test here: http://oii.wp-test.navnorth.com/what-we-do/innovation/skills-for-success/awards/

We were originally told that "the URLs for the attachments are repeating" but when I looked into it I only found a problem with the PDFs for IDEA RAISES Student Achievement and Perseverance Process Project, where the link was mangled to something like this: http://www2.ed.govhttp://www2.ed.gov/programs/skillssuccess/2015unfunded/ideaabst.pdf

My guess is that it's because there are 2 instances of the same link in the page. I'm not sure why this would cause a problem, but it might be the case. It's doing the same on our test server so see what you can do to fix it. Thanks

johnpaulbalagolan commented 9 years ago

fixed #14 with the latest source committed to github. took me a while to figure this out as I had to trace the scraping code and have had intermittent issue with testing the scraping locally.