hujinglin / MVA-Downloader

download MVA course videos with subtitle
MIT License
45 stars 16 forks source link

Can not parse this page #3

Open ghost opened 8 years ago

ghost commented 8 years ago

For example I introduce this URL:

https://mva.microsoft.com/en-us/training-courses/introducci-n-a-json-con-c--12742?l=xxtX274UB_8805494542

I tried with this one: https://mva.microsoft.com/en-us/training-courses/introducci-n-a-json-con-c--12742

For both cases I get this error: Can not parse this page.

ghost commented 8 years ago

Same for me too when using this link - https://mva.microsoft.com/en-US/training-courses/c-fundamentals-for-absolute-beginners-16169 Did you find any fix?

hujinglin commented 8 years ago

Bug fixed. It is caused by missing some request headers when I crawling the MVA page:)

ghost commented 8 years ago

@hujinglin excellent!

ghost commented 8 years ago

@hujinglin thanks :)

prrandrade commented 8 years ago

Sorry, but the problem reappeared... :S

hujinglin commented 8 years ago

@prrandrade Sorry, I have fixed it, run npm install again please I use phantomjs to crawl the MVA page now~

prrandrade commented 8 years ago

Sorry to say that... but it looks like the problem remains :(

I tried with this link: https://mva.microsoft.com/en-us/training-courses/xamarin-for-absolute-beginners-16182

And, after thinking for several seconds, the MVA Downloader page returns with 'can not parse this page'

hujinglin commented 8 years ago

@prrandrade Have you pulled the newest code and and installed the node_modules ? I tried your link is fine~

prrandrade commented 8 years ago

Yeah, you're right... my work Internet is more restricted than I thought... Thanks!

But there are some minor errors with subtitles... for example
are ignored because of .text(). I have no nodejs experience, but that primary change is working here:

$('p').each(function (index) {
var $item = $(this) var itemHtml = $item.html().replace('
', '\r\n') var itemText = $("

").html(itemHtml).text()
var begin = $item.attr('begin') var end = $item.attr('end') srt += index + 1 + '\r\n' + begin + ' --> ' + end + '\r\n' + itemText + '\r\n\r\n' })

hujinglin commented 8 years ago

@prrandrade Thanks a lot! I have fixed the subtitle bug :)

naivefeng commented 7 years ago

I think we can fetch all courses link from sitemap.

DaviBittencourt commented 7 years ago

I'm new here, can anyone explain to me how to run this project on my machine, what do I have to do to make it work? Only run in the browser? Here is a message "Can not parse this page"

hujinglin commented 7 years ago

@naivefeng no space to save 😄

hujinglin commented 7 years ago

@DaviBittencourt tell me the link you are trying to parse, please

DaviBittencourt commented 7 years ago
DaviBittencourt commented 7 years ago

@hujinglin Are using another tool to download the courses of the MVA Or is there only this is?