platonai / exotic-amazon

A complete solution to crawl amazon at scale completely and accurately.
143 stars 46 forks source link

Download Progress Check #7

Open wfh1300 opened 1 year ago

wfh1300 commented 1 year ago

How do I know how long I have to download? 8c37ea662ba94790df95e4bf6e91273

platonai commented 1 year ago

To estimate how long it will take for your task to be completed, you need to know the total number of tasks and the average collection time per task. You can check logs/pulsar.log to gain a summary of the fetch process, as the following:

pulsar.log:

vincent@regulus:~/workspace/exotic-amazon-proj$ less logs/pulsar.log | grep Fetched | tail -n 1
02:49:11.010 [oreMetrics] INFO  ai.platon.pulsar.crawl.CoreMetrics - 🚚 Fetched 980 pages in 2h19m(0.12 pages/s) successfully using 41 proxies | content: 1.23 GiB, 154.94 KiB/s, 1.29 MiB/p

And also you can check logs/pulsar.dc.log to see the progress summary of each individual queue.

pulsar.dc.log:

vincent@regulus:~/workspace/exotic-amazon-proj$ tail -n 20 logs/pulsar.dc.log 
2023-02-24 13:40:26,043 --- 
name          | priority   | pName   | collected | cd/s | collect | c/s  | time     | size | estSize | firstCollect | lastCollect | labels
FCC#RealTime  | -214748364 | HIGHEST | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.4         | -214748364 | HIGHEST | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
DelayCC#Delay | -5000      | HIGHER5 | 387       | 0.05 | 3002    | 0.38 | 2h10m41s | 0    | 0       | 24 00:31:08  | 24 02:41:50 | 
FCC.5         | -5000      | HIGHER5 | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.6         | -4000      | HIGHER4 | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.7         | -3000      | HIGHER3 | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.8         | -2000      | HIGHER2 | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
SCRAPE        | -2000      | HIGHER2 | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | SC*E
FCC.9         | -1000      | HIGHER  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.10        | 0          | NORMAL  | 1000      | 0.14 | 1000    | 0.14 | 1h55m16s | 0    | 0       | 24 00:30:24  | 24 02:25:41 | 
FCC.11        | 1000       | LOWER   | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.12        | 2000       | LOWER2  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.13        | 3000       | LOWER3  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.14        | 4000       | LOWER4  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.15        | 5000       | LOWER5  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 
FCC.16        | 214748364  | LOWEST  | 0         | 0.00 | 0       | 0.00 | 0s       | 0    | 0       | 01 08:00:00  | 01 08:00:00 | 

2023-02-24 13:40:26,045 --- Total collected 1387/0.03/4002/0.08 in 13h10m19s, remaining 0/0, collect time: 2023-02-23T16:30:24.563673Z -> 2023-02-23T18:41:50.141009Z