quanttide / quanttide-handbook-of-data-engineering

量潮数据工程手册

0 stars 0 forks source link

Closed Guo-Zhang closed 3 weeks ago

Guo-Zhang commented 12 months ago

备选方案

EasySpider: 可视化爬虫 mlscraper: 使用这个项目再拓展大模型，或者借鉴他的架构接入大模型。 autoscraper

GPT爬虫：

Open Interpreter:

门槛比较低，可以通过自然语言完全实现爬虫任务，只需要在本地成功安装即可使用。

Guo-Zhang commented 8 months ago

代替传统爬虫的主要步骤是第二步。在使用GPT辅助爬虫代码时，偶然发现直接给网页源码输出数据比生成代码的准确率更高。

Guo-Zhang commented 7 months ago

我今天试验了AutoScraper和MLScraper这两个库。除去代码质量不谈，这两个库的idea差不多，就是给一个示例数据，爬虫内置的机器学习算法自己去学需要爬什么。总体来说不是太智能，数据少了效果比较一般。

目前的方案可以更新为：人工或者大模型标记部分数据，再喂给MLScraper爬全量。

Guo-Zhang commented 6 months ago

备选方案新增：

Guo-Zhang commented 3 months ago

大模型爬虫