lensh / vue-qq

🎨 Vue family bucket with socket.io and express/koa2 , create a web version of mobile QQ, supporting real-time group chat, real-time private chat, special care, shielding chat, smart IP geographic location, real-time display temperature and other QQ core functions
MIT License
917 stars 230 forks source link

获取gbk/gb2312编码的网页 #18

Open lensh opened 6 years ago

lensh commented 6 years ago

爬虫有时候会遇到网页编码为gbk/gb2312的网页,这些网页爬取后,里面的中文是全部乱码的,解决方案是用iconv-lite进行转码。例如这个网页 http://1212.ip138.com/ic.asp ,就是gb2312编码的,爬取到的数据就会是中文乱码。具体转码过程如下:

import http from 'http'
import iconv from 'iconv-lite'    //引入第三方模块

const url = 'http://1212.ip138.com/ic.asp'  //获取到的会是服务器的ip地址
http.get(url, res => {
      let arrBuf = [],
      bufLength = 0
      res.on("data", chunk => {
        arrBuf.push(chunk)
        bufLength += chunk.length
      })
     .on("end", () => {
           const chunkAll = Buffer.concat(arrBuf, bufLength),
           strJson = iconv.decode(chunkAll, 'gb2312'), // 汉字不乱码
           startIndex = strJson.indexOf('省'),
           endIndex = strJson.indexOf('市'),
               city=strJson.substring(startIndex + 1, endIndex)     //城市名
               console.log(strJson,city)   //均是中文
     })
})