Open danfrankj opened 10 years ago
hm yeah, ive accepted a lot of pull requests recently and havent been testing it for a while myself. there are others who have contributd. if you find a bug and a fix, just send a pull
On Wed, Sep 25, 2013 at 5:26 PM, danfrankj notifications@github.com wrote:
I'm currently trying to scrape Natural Language Processing and scrape.py fails when it executes re.search near line 105. I'll continue to try to debug but am pretty new at this.
— Reply to this email directly or view it on GitHubhttps://github.com/jkeesh/scpd-scraper/issues/12 .
I think it's more than a simple bug. I think something changed with scpd. I'll try to debug...
I'll get to it this weekend
It looks like previously, when reaching the course page, the script searched the WMP links and parsed out a useful part (a link?) that it could simply open. Now, the WMP links are calls to Javascript functions that generate the correct URL. It's easy to construct the correct URL the same way the Javascript does, except for an authentication parameter ("slp") that's passed. If we could figure that out, this issue could be fixed.
Example HTML link:
<a href='javascript:openSL("509d37b5-f858-474c-9876-daa31c7346bb","CS221","cab421d9-1581-4f8c-989a-9cc3fdb2833d","130923","","WA","&wmp=true");'>WMP</a>
openSL (copied from chrome web inspector):
//need to go update openSL to only use one param
function openSL(collGuid, courseName, coGuid, lectureName, lectureDesc, desiredAuthType, playerType) {
reqObj = 'coll=' + collGuid + '&course=' + courseName + '&co=' + coGuid + '&lecture=' + lectureName;
// CourseGUIDStr + MyCollection.Name + co.GUID + co.Name + lectureType + desiredAuthType_PARAM
if (lectureDesc == "problem session")
reqObj += '&lectureType=ps';
reqObj += '&authtype=' + desiredAuthType;
PageMethods.playSLVideo(collGuid, coGuid, desiredAuthType, function (slphash) {
if (slphash != null) {
reqObj += '&slp=' + slphash + playerType;
var win = window.open('http://myvideosv.stanford.edu/' + 'player/slplayer.aspx?' + reqObj);
win.focus
} // End if
} //End PageMethodsParameter
); //End PageMethods
} //End OpenSL
The value of slp changes every time you click the (Javascript) link.
ah hm. yes that would break it. if you have a fix ill merge it in.--- seems like this project still gets a good amount of usage.
On Fri, Oct 4, 2013 at 4:57 AM, adotey notifications@github.com wrote:
It looks like previously, when reaching the course page, the script searched the WMP links and parsed out a useful part (a link?) that it could simply open. Now, the WMP links are calls to Javascript functions that generate the correct URL. It's easy to construct the correct URL the same way the Javascript does, except for an authentication parameter ("slp") that's passed. If we could figure that out, this issue could be fixed.
Example HTML link:
openSL (copied from chrome web inspector):
//need to go update openSL to only use one paramfunction openSL(collGuid, courseName, coGuid, lectureName, lectureDesc, desiredAuthType, playerType) { reqObj = 'coll=' + collGuid + '&course=' + courseName + '&co=' + coGuid + '&lecture=' + lectureName;// CourseGUIDStr + MyCollection.Name + co.GUID + co.Name + lectureType + desiredAuthType_PARAM if (lectureDesc == "problem session") reqObj += '&lectureType=ps'; reqObj += '&authtype=' + desiredAuthType; PageMethods.playSLVideo(collGuid, coGuid, desiredAuthType, function (slphash) { if (slphash != null) { reqObj += '&slp=' + slphash + playerType; var win = window.open('http://myvideosv.stanford.edu/' + 'player/slplayer.aspx?' + reqObj); win.focus } // End if
} //End PageMethodsParameter ); //End PageMethods } //End OpenSL
The correct URL:
The value of slp changes every time you click the (Javascript) link.
— Reply to this email directly or view it on GitHubhttps://github.com/jkeesh/scpd-scraper/issues/12#issuecomment-25684519 .
Unfortunately I don't. There's some authentication hash (called "slphash" in the openSL code) being generated and inserted into the URL as a required parameter (slp), but I don't know how it's being generated. Hopefully it's something you or someone else could figure out.
Yeah I don't use it anymore, but help merge pull requests since a bunch of people were still using it
— Jeremy
On Fri, Oct 4, 2013 at 9:04 PM, adotey notifications@github.com wrote:
Unfortunately I don't. There's some authentication hash (called "slphash" in the openSL code) being generated and inserted into the URL as a required parameter (slp), but I don't know how it's being generated. Hopefully it's something you or someone else could figure out.
Reply to this email directly or view it on GitHub: https://github.com/jkeesh/scpd-scraper/issues/12#issuecomment-25739163
This looks tricky- we could try to use another library which allows js calls, but I think, although complicated, this js can be picked apart and replaced with python code.
Someone else has a working (Ruby) script: https://github.com/dennybritz/scpd-downloader
He gets the slp hash by issuing a json request for it.
I made a quick bookmarklet to get the video URL if any of you are interested. http://joon-tech.blogspot.com/2013/10/in-order-to-help-with-issues-with-scpd.html Also there is an accompanying Gist https://gist.github.com/djoeman84/7140185
Any update on this? Not much in the way of recent commits, is anyone working on this?
I'm currently trying to scrape Natural Language Processing and scrape.py fails when it executes re.search near line 105. I'll continue to try to debug but am pretty new at this.