Closed matiskay closed 9 years ago
Aqui estan las lineas de codigo que usan el patron extract()[0]
./inpe.py:70: item['full_name'] = fields[0].xpath("text()").extract()[0].strip()
./inpe.py:73: item['id_document'] = fields[1].xpath("text()").extract()[0].strip()
./inpe.py:77: item['id_number'] = fields[2].xpath("text()").extract()[0].strip()
./inpe.py:78: item['entity'] = fields[3].xpath("text()").extract()[0].strip()
./inpe.py:79: item['reason'] = fields[4].xpath("text()").extract()[0].strip()
./inpe.py:80: item['host_name'] = fields[5].xpath("text()").extract()[0].strip()
./inpe.py:81: item['title'] = fields[6].xpath("text()").extract()[0].strip()
./inpe.py:82: item['office'] = fields[7].xpath("text()").extract()[0].strip()
./inpe.py:86: item['time_start'] = times[1].xpath("text()").extract()[0].strip()
./minem.py:96: item['entity'] = re.sub("\s+", " ", fields[3].xpath("text()").extract()[0].strip())
./minem.py:97: item['host_name'] = re.sub("\s+", " ", fields[5].xpath("text()").extract()[0].strip())
./minem.py:98: item['reason'] = re.sub("\s+", " ", fields[4].xpath("text()").extract()[0].strip())
./minem.py:99: item['title'] = re.sub("\s+", " ", fields[6].xpath("text()").extract()[0].strip())
./minem.py:100: item['office'] = re.sub("\s+", " ", fields[7].xpath("text()").extract()[0].strip())
./minem.py:101: item['time_start'] = re.sub("\s+", " ", fields[8].xpath("text()").extract()[0].strip())
./minem.py:104: document_identity = fields[2].xpath("text()").extract()[0].strip()
./minem.py:112: item['time_end'] = re.sub("\s+", " ", fields[9].xpath("text()").extract()[0].strip())
./mtc.py:17: event_validation = response.xpath('//input[@id="__EVENTVALIDATION"]/@value').extract()[0]
./produce.py:61: item['time_start'] = this_record[2].xpath('text()').extract()[0]
./produce.py:66: item['full_name'] = this_record[3].xpath('text()').extract()[0]
./produce.py:71: item['id_document'] = this_record[4].xpath('text()').extract()[0]
./produce.py:76: item['id_number'] = this_record[5].xpath('text()').extract()[0]
./produce.py:81: item['reason'] = this_record[6].xpath('text()').extract()[0]
./produce.py:86: item['host_name'] = this_record[7].xpath('text()').extract()[0]
./produce.py:91: item['office'] = this_record[8].xpath('text()').extract()[0]
./produce.py:96: item['time_end'] = this_record[9].xpath('text()').extract()[0]
./tc.py:58: item['full_name'] = sel.xpath('td')[2].xpath('text()').extract()[0]
./tc.py:63: item['id_document'] = sel.xpath('td')[3].xpath('text()').extract()[0]
./tc.py:68: item['id_number'] = sel.xpath('td')[4].xpath('text()').extract()[0]
./tc.py:73: item['reason'] = sel.xpath('td')[5].xpath('text()').extract()[0]
./tc.py:78: item['host_name'] = sel.xpath('td')[6].xpath('text()').extract()[0]
./tc.py:83: item['time_start'] = sel.xpath('td')[1].xpath('text()').extract()[0]
./tc.py:88: item['time_end'] = sel.xpath('td')[8].xpath('text()').extract()[0]
./tc.py:100: item['full_name'] = sel.xpath('td')[2].xpath('text()').extract()[0]
./tc.py:105: item['id_document'] = sel.xpath('td')[3].xpath('text()').extract()[0]
./tc.py:110: item['id_number'] = sel.xpath('td')[4].xpath('text()').extract()[0]
./tc.py:115: item['reason'] = sel.xpath('td')[5].xpath('text()').extract()[0]
./tc.py:120: item['host_name'] = sel.xpath('td')[6].xpath('text()').extract()[0]
./tc.py:125: item['time_start'] = sel.xpath('td')[1].xpath('text()').extract()[0]
./tc.py:130: item['time_end'] = sel.xpath('td')[7].xpath('text()').extract()[0]
./tc.py:142: item['full_name'] = sel.xpath('td')[1].xpath('text()').extract()[0]
./tc.py:147: item['id_document'], item['id_number'] = utils.get_dni(sel.xpath('td')[2].xpath('text()').extract()[0])
./tc.py:153: item['entity'] = sel.xpath('td')[3].xpath('text()').extract()[0]
./tc.py:158: item['reason'] = sel.xpath('td')[4].xpath('text()').extract()[0]
./tc.py:163: item['host_name'] = sel.xpath('td')[5].xpath('text()').extract()[0]
./tc.py:168: item['office'] = sel.xpath('td')[6].xpath('text()').extract()[0]
./tc.py:173: item['time_start'] = sel.xpath('td')[7].xpath('text()').extract()[0]
./tc.py:178: item['time_end'] = sel.xpath('td')[8].xpath('text()').extract()[0]
y las spider que usan este patron son.
./inpe.py
./minem.py
./mtc.py
./produce.py
./tc.py
extract()[0]
. grep -nR 'extract()\[0\]' .
grep -nR 'extract()\[0\]' . | cut -d ':' -f 1 | uniq
asu, “Hay, hermanos, muchísimo que hacer”
The only spider that is reminding is TcSpider
. I think we can close because there is no consistency in the visit pages for "Tribunal Consitucional".
The current pattern in the code to handle first element extraction is the following
This pattern is pretty ugly the best way to do it is to use
extract_first(default='')
method.