jianqiaomo / NEEA-TOEFL-Testseat-Crawler

托福考位爬虫 NEEA TOEFL Testseat Crawler
20 stars 0 forks source link

一个思路 #1

Open TURX opened 4 years ago

TURX commented 4 years ago

这是官方的代码,可以不使用WebDriver。请求不能太频繁,否则会报400错误。

$.getJSON("testSeat/queryTestSeats", {
    city: $("#centerProvinceCity").val(),
    testDay: $("#testDays").val(),
    qryType: "NewOrder"
}, function (data) {
    if (data.status == true) {
        var tmpl = $.templates("testSeatListTemplate", {
            markup: "#testSeatListTpl",
            helpers: {
                formatCurrency: formatTestFee
            }
        }); // Get compiled template
        var html = tmpl.render(data);
        $("#qrySeatResult").html(html);
    } else {
        layer.msg("未查询到考位信息", {time: 2000, icon: 0, shift: 0});
        $("#qrySeatResult").empty();
    }
});

然后可以做一个指定城市和时间的功能吗,这样也可以减少不必要的操作。

jianqiaomo commented 4 years ago

谢谢回复! 指定时空间的功能已经在写了,不过还是暂时基于WebDriver模拟的方法来。 我不熟悉JS jQuery,如果要按这样来写,可能需要一些时间。如果您有这方面的代码,也可以直接在上面分享!

TURX commented 4 years ago

登录之后的URL是https://toefl.neea.cn/myHome/[NEEA ID]/index,并在浏览器中保存新cookie(domain为toefl.neea.cnneea.cn

将NEEA ID填入,通过GET方法生成对https://toefl.neea.cn/myHome/[NEEA ID]/testSeat/queryTestSeats的请求 请求参数:citytestDayqryType=NewOrder 拼接得到类似于https://toefl.neea.cn/myHome/1234567/testSeat/queryTestSeats?city=BEIJING&testDay=2020-08-19&qryType=NewOrder的请求URL

向URL发送请求,请求头必须包含登录时得到的cookie,将收到的数据展开到JSON,结果大致如下

{"status":true,"testDate":"2020年8月19日 星期三","testSeats":{"09:00|20200819|08:30":[{"seatId":"xxx","provinceCn":"北京","provinceEn":"BEIJING","cityCn":"北京","cityEn":"BEIJING","centerCode":"STN80120A","centerNameCn":"北京市私立汇佳学校","centerNameEn":"Beijing Huijia Private School","testFee":210000,"lateReg":"N","seatStatus":0,"seatBookStatus":0,"rescheduleDeadline":1597507199000,"cancelDeadline":1597507199000,"testTime":"09:00","lateRegFlag":"N"},{"seatId":"xxx","provinceCn":"北京","provinceEn":"BEIJING","cityCn":"北京","cityEn":"BEIJING","centerCode":"STN80120B","centerNameCn":"北京市私立汇佳学校","centerNameEn":"Beijing Huijia Private School","testFee":210000,"lateReg":"N","seatStatus":0,"seatBookStatus":0,"rescheduleDeadline":1597507199000,"cancelDeadline":1597507199000,"testTime":"09:00","lateRegFlag":"N"},{"seatId":"xxx","provinceCn":"北京","provinceEn":"BEIJING","cityCn":"北京","cityEn":"BEIJING","centerCode":"STN80120C","centerNameCn":"北京市私立汇佳学校","centerNameEn":"Beijing Huijia Private School","testFee":210000,"lateReg":"N","seatStatus":0,"seatBookStatus":0,"rescheduleDeadline":1597507199000,"cancelDeadline":1597507199000,"testTime":"09:00","lateRegFlag":"N"},{"seatId":"xxx","provinceCn":"北京","provinceEn":"BEIJING","cityCn":"北京","cityEn":"BEIJING","centerCode":"STN80120D","centerNameCn":"北京市私立汇佳学校","centerNameEn":"Beijing Huijia Private School","testFee":210000,"lateReg":"N","seatStatus":0,"seatBookStatus":0,"rescheduleDeadline":1597507199000,"cancelDeadline":1597507199000,"testTime":"09:00","lateRegFlag":"N"}]},"lateRegFee":31000}

status:为true时为有考位(包括不可报名的),为false时为无考位 testSeats:不同时间的考位列表,其中每个考位的seatStatus1时可以报名,为0时不可报名 还有其他一些显然的字段

jianqiaomo commented 4 years ago

哦哦,我明白了。 根据知乎大佬@Michael的说法(https://zhuanlan.zhihu.com/p/69520194):

如果不采用Selenium,直接使用Python的Request库进行请求,则会需要把cookie值一起添加到请求中去;而cookie值的破译相当麻烦,例如这里就用了Google的一个库,根本破解不了 :(。