zsxsoft / my-beancount-scripts

Git repo to save my Beancount scripts
363 stars 71 forks source link

关于支付宝的账单不包含支付渠道的问题 #6

Closed thelittlefox closed 3 years ago

thelittlefox commented 3 years ago

由于支付宝的账单不包含支付渠道,因此支付宝账单都被统一归结到Assets:Company:Alipay:StupidAlipay账户之下。

在支付宝个人版的交易记录点击查看详情,在详情页能看到交易账户,如果是余额宝会提示余额宝,如果是银行卡,会显示卡号后四位。

交易记录会分页,每页10条记录。 尝试过写爬虫自动爬取交易记录详情页的链接,触发了支付宝的反爬机制。

有个想法不知可不可行。 写个chrome的插件,在手动点击交易记录的下一页时,拦截交易记录页的返回请求,从中取出详情页的链接,再自动访问交易详情页,拿到交易流水号和卡号后四位。 与导出的csv 账单用交易流水号进行匹配,即得到带卡号的完整账单。

zsxsoft commented 3 years ago
image

你说的是这个按钮?这个会不会触发反爬也待测试

zsxsoft commented 3 years ago

我爬了一页似乎没触发反爬,可以试试

zsxsoft commented 3 years ago

翻页就反爬了,傻逼玩意 有想试的可以试一下,这个可以爬一整页的详情,怎么和脚本配合我还没写。 点击下一页就要重新登录了,傻逼支付宝

// ==UserScript==
// @name         Alipay To Beancount
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://consumeprod.alipay.com/*
// @grant   GM_getValue
// @grant   GM_setValue
// @run-at document-start
// ==/UserScript==

(function() {
    const gmTempKeyPrefix = 'alipay_to_beancount_temp_prefix'
    const setValue = (a, b) => GM_setValue(`${gmTempKeyPrefix}${a}`, b)
    const getValue = (a) => GM_getValue(`${gmTempKeyPrefix}${a}`)
    let saved = getValue('saved') || []

    document.addEventListener('DOMContentLoaded', () => {
        console.log(11)
        const link = document.createElement('a')
        link.innerText = '尝试获取账单关联'
        const getDetails = async () => {
            const pages = await Promise.all(Array.from(
                document.querySelectorAll('.record-icon.icon-detail.icon-detail-trigger')
            ).map(a => fetch(a)
                  .then(a => a.blob())
                  .then(blob => new Promise(resolve => {
                        const reader = new FileReader()
                        reader.onload = e => {
                            var text = reader.result
                            resolve(text)
                        }
                        reader.readAsText(blob, 'GBK')
                    })))
            )
            const d = new DOMParser()
            pages.forEach(page => {
                const doc = d.parseFromString(page, 'text/html')
                const serialNumber = doc.querySelectorAll('.ft-gray')[1].innerText
                const detail = []
                const detailList = Array.from(doc.querySelectorAll('.detail-list'))
                detailList.forEach(a => {
                    const f = a.querySelector('.fundTool').innerText.trim()
                    const price = a.querySelector('.money').innerText.trim()
                    detail.push([f, price])
                })
                saved.push([serialNumber, detail])
            })
            console.log(saved)
            if (document.querySelector('.page-next')) {
                alert('Please click nextpage')
                //document.querySelector('.page-next').click()
            } else {
                alert('Done!')
                console.log(saved)
            }
        }
        link.onclick = () => {
            setValue('saved', [])
            saved = []
            getDetails()
        }
        document.querySelector('.link').appendChild(link)
    })
})()
thelittlefox commented 3 years ago

2020-12-13_142546

image

是的,支付宝的反爬做的很敏感。 之前也联系过他们,导出的CSV能不能加上卡号,没有反馈。

现在的想法是,手工翻交易明细页(不触发反爬)。 用浏览器插件在后台拦截network请求, 交易明细的网页URL为https://consumeprod.alipay.com/record/advanced.htm。 从该网页的Response中能获得查看明细的URL https://consumeprod.alipay.com/record/detail/simpleDetail.htm?bizType=TRADE&bizInNo=202012&gmtBizCreate=202012 将这些查看明细的URL收集到一个数组里,再逐个访问这些URL,拦截network请求, 从中获取流水号和交易卡号。

zsxsoft commented 3 years ago

这种方式我试了也会被反爬,我再调一下

// ==UserScript==
// @name         Alipay To Beancount
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://consumeprod.alipay.com/*
// @grant        GM_getValue
// @grant        GM_setValue
// @run-at       document-start
// ==/UserScript==

(function() {
    const gmTempKeyPrefix = 'alipay_to_beancount_temp_prefix'
    const setValue = (a, b) => GM_setValue(`${gmTempKeyPrefix}${a}`, b)
    const getValue = (a) => GM_getValue(`${gmTempKeyPrefix}${a}`)
    let saved = getValue('saved') || []
    let links = getValue('links') || []
    const parser = new DOMParser()
    const wait = time => new Promise((resolve) => setTimeout(resolve, time))

    const getDetail = (url) => {
        return fetch(url)
        .then(a => a.blob())
        .then(blob => new Promise(resolve => {
              const reader = new FileReader()
              reader.onload = e => resolve(reader.result)
              reader.readAsText(blob, 'GBK')
          }))
        .then(text => {
            const doc = parser.parseFromString(page, 'text/html')
            const serialNumber = doc.querySelectorAll('.ft-gray')[1].innerText
            const detail = []
            const detailList = Array.from(doc.querySelectorAll('.detail-list'))
            detailList.forEach(a => {
                const f = a.querySelector('.fundTool').innerText.trim()
                const price = a.querySelector('.money').innerText.trim()
                detail.push([f, price])
            })
            return [serialNumber, detail]
        })
    }

    const getAllDetails = async () => {
        const uniqueLinks = Array.from(new Set(links))
        const promises = uniqueLinks.map((a, i) => wait(i * 100).then(_ => getDetail(a)))
        while (promises.length) {
            const current = await Promise.all( promises.splice(0, 10).map(f => f()) )
            console.log(current)
            saved.push(...current)
        }
        console.log(saved)
    }

    document.addEventListener('DOMContentLoaded', () => {
        const link = document.createElement('a')
        link.innerText = '尝试获取账单关联'
        const getDetailLinks = async () => {
            const pages = Array.from(document.querySelectorAll('.record-icon.icon-detail.icon-detail-trigger')).map(a => a.href)
            pages.forEach(a => links.push(a))
            setValue('links', links)
            console.log(links)
            if (document.querySelector('.page-next')) {
                window.scrollTo(0, 10000)
                wait(200).then(() => {
                    // document.querySelector('.page-next').click()
                })
            } else {
                alert('Done!')
                console.log(links)
                getAllDetails()
            }
        }
        link.onclick = () => {
            setValue('saved', [])
            setValue('links', [])
            saved = []
            links = []
            getDetailLinks()
        }
        if (links.length > 0) {
            getDetailLinks()
        }
        document.querySelector('.link').appendChild(link)
    })
})()
zsxsoft commented 3 years ago

手动翻页+自动打开也不行,我还加了随机延迟

我没了

// ==UserScript==
// @name         Alipay To Beancount
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://consumeprod.alipay.com/*
// @grant        GM_getValue
// @grant        GM_setValue
// @run-at       document-start
// ==/UserScript==

(function() {
    const gmTempKeyPrefix = 'alipay_to_beancount_temp_prefix'
    const setValue = (a, b) => GM_setValue(`${gmTempKeyPrefix}${a}`, b)
    const getValue = (a) => GM_getValue(`${gmTempKeyPrefix}${a}`)
    let saved = getValue('saved') || []
    let links = getValue('links') || []
    const parser = new DOMParser()
    const wait = time => new Promise((resolve) => setTimeout(resolve, time))

    const getDetail = (url) => {
        return fetch(url)
        .then(a => a.blob())
        .then(blob => new Promise(resolve => {
              const reader = new FileReader()
              reader.onload = e => resolve(reader.result)
              reader.readAsText(blob, 'GBK')
          }))
        .then(text => {
            const doc = parser.parseFromString(text, 'text/html')
            const serialNumber = doc.querySelectorAll('.ft-gray')[1].innerText
            const detail = []
            const detailList = Array.from(doc.querySelectorAll('.detail-list'))
            detailList.forEach(a => {
                const f = a.querySelector('.fundTool').innerText.trim()
                const price = a.querySelector('.money').innerText.trim()
                detail.push([f, price])
            })
            return [serialNumber, detail]
        })
    }

    document.addEventListener('DOMContentLoaded', () => {
        const link = document.createElement('a')
        link.innerText = '尝试获取账单关联'
        const getDetailLinks = async () => {
            const pages = Array.from(document.querySelectorAll('.record-icon.icon-detail.icon-detail-trigger')).map(a => a.href)
            const ret = await Promise.all(pages.map((a, i) => wait(i * 100 + 1000 * Math.random()).then(_ => getDetail(a))))
            saved.push(...ret)
            if (document.querySelector('.page-next')) {
                window.scrollTo(0, 10000)
                wait(200).then(() => {
                    // document.querySelector('.page-next').click()
                })
            } else {
                alert('Done!')
                console.log(saved)
                getAllDetails()
            }
        }
        link.onclick = () => {
            setValue('saved', [])
            setValue('links', [])
            saved = []
            links = []
            getDetailLinks()
        }
        if (links.length > 0) {
            getDetailLinks()
        }
        document.querySelector('.link').appendChild(link)
    })
})()
thelittlefox commented 3 years ago

我试试 chrome 插件能不能拦截到请求

thelittlefox commented 3 years ago

我也没了。。。 用浏览器手动同时打开几个交易详情页就触发反爬了。 没得玩了

zsxsoft commented 3 years ago

其实可以仿造黑产的做法,走Xposed之类的,就是成本太高了……

thelittlefox commented 3 years ago

Xposed是不是需要root? 还有一个想法 手机端流量走pc上架设的代理,在pc上抓支付宝客户端的https包。 因为在支付宝手机app上可以看到某个银行卡下的交易明细。 不过看起来也挺折腾。

看来目前最佳方案还是你已经实现的银行卡流水和支付宝流水做匹配。毕竟在时间和金额两个条件限制下,重复记录应该很少的。

zsxsoft commented 3 years ago

想都不用想,支付宝肯定有Certificate Pinning,抓包肯定抓不到

thelittlefox commented 3 years ago

对Certificate Pinning,Xposed 有解决模块JustTrustMe 哈哈哈

IShinji commented 3 years ago

puppeteer是不是可以尝试一下?就是可能稍微繁琐了点。

zsxsoft commented 3 years ago

@IShinji 支付宝的反爬规避了所有常见无头浏览器,我没有去bypass的想法

IShinji commented 3 years ago

再换个思路……用Auto.js

Riatre commented 3 years ago

楼上大哥手动打开几个交易详情页都会触发反爬,用什么都没用吧。

broven commented 3 years ago

测试了下, 通过这途径获取的账单里是有渠道的, 手机上申请, 会发到邮箱 csv格式, 最长3个月 image

zsxsoft commented 3 years ago

@broven Thx! 支持支付方式导入了 https://github.com/zsxsoft/my-beancount-scripts/commit/65a2d2bc3b3ae6595ba0e7ccb12e84cf11a6705e