zc3945 / caipanwenshu

裁判文书网爬虫demo,2020-04-23更新
85 stars 42 forks source link

请问在获取左侧Tree结构时,“上传日期”是什么参数 #3

Closed easyfast closed 5 years ago

easyfast commented 5 years ago

class DocSpider(scrapy.Spider): name = 'doc' allowed_domains = ['gov.cn'] start_urls = ['http://gov.cn/']

def start_requests(self):
    url = 'http://wenshu.court.gov.cn/list/list/?sorttype=1&conditions=searchWord+%E5%90%88%E5%90%8C+++%E5%85%B3%E9%94%AE%E8%AF%8D:%E5%90%88%E5%90%8C'
    yield scrapy.Request(url, callback=self.parse)

def parse(self, response):
    """
    根据日期查询分类数量
    :param response:
    :return:
    """
    cookie = response.headers['Set-Cookie'].split(';')[0][6:]
    vjkl5 = getvjkl5(cookie)
    for index in range(1, 2):
        end_day = (datetime.datetime.now() - datetime.timedelta(days=index)).date().__str__()
        start_day = (datetime.datetime.now() - datetime.timedelta(days=index+1)).date().__str__()
        Param = u'上传日期:{} TO {}'.format(start_day, end_day)
        data = {'Param': Param, 'vl5x': vjkl5}
        yield scrapy.FormRequest('http://wenshu.court.gov.cn/List/TreeContent', headers={'Cookie': cookie},
                                 callback=self.get_tree_list, formdata=data,
                                 meta={'cookie': cookie, 'vjkl5': vjkl5, 'Param': Param,
                                       'type_list': [u'法院地域', u'文书类型', u'法院层级',u'审判程序', u'裁判年份', u'一级案由']})

如上所示,在获取左侧tree结构时,有一个参数名字叫“上传日期”,对照裁判文书网抓包记录中,未发现有参数名字叫这个啊?求示意。

zc3945 commented 5 years ago

这个是根据上传日期来查询的,由于数量较多,且接口限制每个查询条件只能返回200条,建议逐天查询。