platonai / PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
GNU Affero General Public License v3.0
744 stars 119 forks source link

[main] INFO ai.platon.pulsar.crawl.component.LoadComponent.Task - 3. 💔 💿 U got 1600 0 <- 0 in , fc:1 ProtoNotFound(1600) #38

Closed xanren7716 closed 10 months ago

xanren7716 commented 1 year ago

"C:\Program Files\Java\jdk-20\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2023.1.3\lib\idea_rt.jar=3907:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2023.1.3\bin" -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 -classpath C:\Users\Administrator\IdeaProjects\PulsarContexts\target\classes;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-stdlib-jdk8\1.8.21\kotlin-stdlib-jdk8-1.8.21.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-stdlib\1.8.21\kotlin-stdlib-1.8.21.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-stdlib-common\1.8.21\kotlin-stdlib-common-1.8.21.jar;C:\Users\Administrator.m2\repository\org\jetbrains\annotations\13.0\annotations-13.0.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-stdlib-jdk7\1.8.21\kotlin-stdlib-jdk7-1.8.21.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-skeleton\1.10.12\pulsar-skeleton-1.10.12.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-common\1.10.12\pulsar-common-1.10.12.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-core\5.3.17\spring-core-5.3.17.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-jcl\5.3.17\spring-jcl-5.3.17.jar;C:\Users\Administrator.m2\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;C:\Users\Administrator.m2\repository\org\apache\httpcomponents\httpclient\4.5.13\httpclient-4.5.13.jar;C:\Users\Administrator.m2\repository\org\apache\httpcomponents\httpcore\4.4.13\httpcore-4.4.13.jar;C:\Users\Administrator.m2\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;C:\Users\Administrator.m2\repository\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\Users\Administrator.m2\repository\com\ibm\icu\icu4j\4.0.1\icu4j-4.0.1.jar;C:\Users\Administrator.m2\repository\commons-io\commons-io\2.11.0\commons-io-2.11.0.jar;C:\Users\Administrator.m2\repository\org\apache\commons\commons-lang3\3.12.0\commons-lang3-3.12.0.jar;C:\Users\Administrator.m2\repository\org\apache\commons\commons-math3\3.3\commons-math3-3.3.jar;C:\Users\Administrator.m2\repository\org\codehaus\woodstox\stax2-api\4.2.1\stax2-api-4.2.1.jar;C:\Users\Administrator.m2\repository\com\fasterxml\woodstox\woodstox-core\6.4.0\woodstox-core-6.4.0.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\module\jackson-module-kotlin\2.13.4\jackson-module-kotlin-2.13.4.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.13.4\jackson-databind-2.13.4.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.13.4\jackson-annotations-2.13.4.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-properties\2.13.4\jackson-dataformat-properties-2.13.4.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\core\jackson-core\2.13.4\jackson-core-2.13.4.jar;C:\Users\Administrator.m2\repository\com\fasterxml\jackson\datatype\jackson-datatype-jsr310\2.13.4\jackson-datatype-jsr310-2.13.4.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-serialization\1.5.32\kotlin-serialization-1.5.32.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-gradle-plugin-api\1.5.32\kotlin-gradle-plugin-api-1.5.32.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-native-utils\1.5.32\kotlin-native-utils-1.5.32.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-util-io\1.5.32\kotlin-util-io-1.5.32.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-project-model\1.5.32\kotlin-project-model-1.5.32.jar;C:\Users\Administrator.m2\repository\org\nibor\autolink\autolink\0.10.0\autolink-0.10.0.jar;C:\Users\Administrator.m2\repository\ch\qos\logback\logback-classic\1.2.9\logback-classic-1.2.9.jar;C:\Users\Administrator.m2\repository\ch\qos\logback\logback-core\1.2.9\logback-core-1.2.9.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-persist\1.10.12\pulsar-persist-1.10.12.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\gora-shaded-mongodb\0.8\gora-shaded-mongodb-0.8.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-jsoup\1.14.3\pulsar-jsoup-1.14.3.jar;C:\Users\Administrator.m2\repository\org\apache\avro\avro\1.8.1\avro-1.8.1.jar;C:\Users\Administrator.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\Administrator.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\Administrator.m2\repository\com\thoughtworks\paranamer\paranamer\2.7\paranamer-2.7.jar;C:\Users\Administrator.m2\repository\org\xerial\snappy\snappy-java\1.1.1.3\snappy-java-1.1.1.3.jar;C:\Users\Administrator.m2\repository\org\tukaani\xz\1.5\xz-1.5.jar;C:\Users\Administrator.m2\repository\org\apache\gora\gora-core\0.8\gora-core-0.8.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-rt-frontend-jaxrs\2.5.2\cxf-rt-frontend-jaxrs-2.5.2.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-common-utilities\2.5.2\cxf-common-utilities-2.5.2.jar;C:\Users\Administrator.m2\repository\org\apache\ws\xmlschema\xmlschema-core\2.0.1\xmlschema-core-2.0.1.jar;C:\Users\Administrator.m2\repository\org\codehaus\woodstox\woodstox-core-asl\4.1.1\woodstox-core-asl-4.1.1.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-api\2.5.2\cxf-api-2.5.2.jar;C:\Users\Administrator.m2\repository\org\apache\neethi\neethi\3.0.1\neethi-3.0.1.jar;C:\Users\Administrator.m2\repository\wsdl4j\wsdl4j\1.6.2\wsdl4j-1.6.2.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-rt-core\2.5.2\cxf-rt-core-2.5.2.jar;C:\Users\Administrator.m2\repository\com\sun\xml\bind\jaxb-impl\2.1.13\jaxb-impl-2.1.13.jar;C:\Users\Administrator.m2\repository\org\apache\geronimo\specs\geronimo-javamail_1.4_spec\1.7.1\geronimo-javamail_1.4_spec-1.7.1.jar;C:\Users\Administrator.m2\repository\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-rt-bindings-xml\2.5.2\cxf-rt-bindings-xml-2.5.2.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-rt-transports-http\2.5.2\cxf-rt-transports-http-2.5.2.jar;C:\Users\Administrator.m2\repository\org\apache\cxf\cxf-rt-transports-common\2.5.2\cxf-rt-transports-common-2.5.2.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-web\3.0.6.RELEASE\spring-web-3.0.6.RELEASE.jar;C:\Users\Administrator.m2\repository\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\Administrator.m2\repository\org\codehaus\jettison\jettison\1.3.1\jettison-1.3.1.jar;C:\Users\Administrator.m2\repository\org\apache\avro\avro-mapred\1.8.1\avro-mapred-1.8.1.jar;C:\Users\Administrator.m2\repository\org\apache\avro\avro-ipc\1.8.1\avro-ipc-1.8.1.jar;C:\Users\Administrator.m2\repository\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;C:\Users\Administrator.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\Administrator.m2\repository\io\netty\netty\3.5.13.Final\netty-3.5.13.Final.jar;C:\Users\Administrator.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\Administrator.m2\repository\org\apache\gora\gora-compiler\0.8\gora-compiler-0.8.jar;C:\Users\Administrator.m2\repository\org\apache\avro\avro-compiler\1.8.1\avro-compiler-1.8.1.jar;C:\Users\Administrator.m2\repository\org\apache\velocity\velocity\1.7\velocity-1.7.jar;C:\Users\Administrator.m2\repository\joda-time\joda-time\2.7\joda-time-2.7.jar;C:\Users\Administrator.m2\repository\org\jgrapht\jgrapht-core\1.0.0\jgrapht-core-1.0.0.jar;C:\Users\Administrator.m2\repository\org\jgrapht\jgrapht-ext\1.0.0\jgrapht-ext-1.0.0.jar;C:\Users\Administrator.m2\repository\org\tinyjee\jgraphx\jgraphx\2.0.0.1\jgraphx-2.0.0.1.jar;C:\Users\Administrator.m2\repository\jgraph\jgraph\5.13.0.0\jgraph-5.13.0.0.jar;C:\Users\Administrator.m2\repository\org\antlr\antlr4-runtime\4.5.3\antlr4-runtime-4.5.3.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-context\5.3.17\spring-context-5.3.17.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-aop\5.3.17\spring-aop-5.3.17.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-beans\5.3.17\spring-beans-5.3.17.jar;C:\Users\Administrator.m2\repository\org\springframework\spring-expression\5.3.17\spring-expression-5.3.17.jar;C:\Users\Administrator.m2\repository\javax\xml\bind\jaxb-api\2.3.1\jaxb-api-2.3.1.jar;C:\Users\Administrator.m2\repository\javax\activation\javax.activation-api\1.2.0\javax.activation-api-1.2.0.jar;C:\Users\Administrator.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\Administrator.m2\repository\org\apache\hadoop\hadoop-common\2.7.2\hadoop-common-2.7.2.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-dom\1.10.12\pulsar-dom-1.10.12.jar;C:\Users\Administrator.m2\repository\com\udojava\EvalEx\2.0\EvalEx-2.0.jar;C:\Users\Administrator.m2\repository\org\perf4j\perf4j\0.9.16\perf4j-0.9.16.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-browser\1.10.12\pulsar-browser-1.10.12.jar;C:\Users\Administrator.m2\repository\io\dropwizard\metrics\metrics-core\4.1.29\metrics-core-4.1.29.jar;C:\Users\Administrator.m2\repository\javax\websocket\javax.websocket-api\1.1\javax.websocket-api-1.1.jar;C:\Users\Administrator.m2\repository\org\glassfish\tyrus\tyrus-container-grizzly-client\1.13.1\tyrus-container-grizzly-client-1.13.1.jar;C:\Users\Administrator.m2\repository\org\glassfish\grizzly\grizzly-framework\2.3.25\grizzly-framework-2.3.25.jar;C:\Users\Administrator.m2\repository\org\glassfish\grizzly\grizzly-http-server\2.3.25\grizzly-http-server-2.3.25.jar;C:\Users\Administrator.m2\repository\org\glassfish\grizzly\grizzly-http\2.3.25\grizzly-http-2.3.25.jar;C:\Users\Administrator.m2\repository\org\glassfish\tyrus\tyrus-client\1.13.1\tyrus-client-1.13.1.jar;C:\Users\Administrator.m2\repository\org\glassfish\tyrus\tyrus-core\1.13.1\tyrus-core-1.13.1.jar;C:\Users\Administrator.m2\repository\org\glassfish\tyrus\tyrus-spi\1.13.1\tyrus-spi-1.13.1.jar;C:\Users\Administrator.m2\repository\com\github\kklisura\cdt\cdt-java-client\4.0.0\cdt-java-client-4.0.0.jar;C:\Users\Administrator.m2\repository\org\javassist\javassist\3.24.1-GA\javassist-3.24.1-GA.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-ql-common\1.10.12\pulsar-ql-common-1.10.12.jar;C:\Users\Administrator.m2\repository\ai\platon\pulsar\pulsar-h2\1.4.196\pulsar-h2-1.4.196.jar;C:\Users\Administrator.m2\repository\org\apache\commons\commons-collections4\4.4\commons-collections4-4.4.jar;C:\Users\Administrator.m2\repository\com\google\code\crawler-commons\crawler-commons\0.5\crawler-commons-0.5.jar;C:\Users\Administrator.m2\repository\org\apache\tika\tika-core\1.6\tika-core-1.6.jar;C:\Users\Administrator.m2\repository\org\slf4j\slf4j-api\1.7.7\slf4j-api-1.7.7.jar;C:\Users\Administrator.m2\repository\com\google\guava\guava\30.1-jre\guava-30.1-jre.jar;C:\Users\Administrator.m2\repository\com\google\guava\failureaccess\1.0.1\failureaccess-1.0.1.jar;C:\Users\Administrator.m2\repository\com\google\guava\listenablefuture\9999.0-empty-to-avoid-conflict-with-guava\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;C:\Users\Administrator.m2\repository\com\google\code\findbugs\jsr305\3.0.2\jsr305-3.0.2.jar;C:\Users\Administrator.m2\repository\org\checkerframework\checker-qual\3.5.0\checker-qual-3.5.0.jar;C:\Users\Administrator.m2\repository\com\google\errorprone\error_prone_annotations\2.3.4\error_prone_annotations-2.3.4.jar;C:\Users\Administrator.m2\repository\com\google\j2objc\j2objc-annotations\1.3\j2objc-annotations-1.3.jar;C:\Users\Administrator.m2\repository\com\google\code\gson\gson\2.10.1\gson-2.10.1.jar;C:\Users\Administrator.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\Administrator.m2\repository\com\beust\jcommander\1.81\jcommander-1.81.jar;C:\Users\Administrator.m2\repository\com\github\oshi\oshi-core\5.6.1\oshi-core-5.6.1.jar;C:\Users\Administrator.m2\repository\net\java\dev\jna\jna\5.8.0\jna-5.8.0.jar;C:\Users\Administrator.m2\repository\net\java\dev\jna\jna-platform\5.8.0\jna-platform-5.8.0.jar;C:\Users\Administrator.m2\repository\io\dropwizard\metrics\metrics-graphite\4.1.29\metrics-graphite-4.1.29.jar;C:\Users\Administrator.m2\repository\com\rabbitmq\amqp-client\5.14.0\amqp-client-5.14.0.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlinx\kotlinx-coroutines-jdk8\1.6.4\kotlinx-coroutines-jdk8-1.6.4.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlinx\kotlinx-coroutines-core-jvm\1.6.4\kotlinx-coroutines-core-jvm-1.6.4.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlin\kotlin-reflect\1.5.32\kotlin-reflect-1.5.32.jar;C:\Users\Administrator.m2\repository\org\jetbrains\kotlinx\kotlinx-coroutines-core\1.6.4\kotlinx-coroutines-core-1.6.4.jar ai.platon.pulsar.examples.sites.topEc.english.amazon.MainKt 16:14:22.745 [main] INFO ai.platon.pulsar.common.config.AbstractConfiguration - Find legacy resource: jar:file:/C:/Users/Administrator/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.12/pulsar-skeleton-1.10.12.jar!/config/legacy/pulsar-default.xml 16:14:22.748 [main] INFO ai.platon.pulsar.common.config.AbstractConfiguration - Find legacy resource: jar:file:/C:/Users/Administrator/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.12/pulsar-skeleton-1.10.12.jar!/config/legacy/pulsar-site.xml 16:14:22.749 [main] INFO ai.platon.pulsar.common.config.AbstractConfiguration - Resource not find: pulsar-task.xml 16:14:22.774 [main] INFO ai.platon.pulsar.common.config.AbstractConfiguration - profile: <> | [pulsar-default.xml, pulsar-site.xml] 16:14:22.792 [main] INFO ai.platon.pulsar.crawl.protocol.ProtocolFactory - Supported protocols: 16:14:22.812 [main] INFO ai.platon.pulsar.crawl.parse.html.PrimerHtmlParser - className: PrimerHtmlParser defaultCharEncoding: utf-8 16:14:22.879 [main] INFO ai.platon.pulsar.crawl.parse.PageParser - maxParseTime: PT1M maxParsedLinks: 200 groupMode: BY_HOST ignoreExternalLinks: false maxUrlLength: 1024 16:14:22.904 [main] INFO ai.platon.pulsar.crawl.impl.StreamingCrawlLoop - Crawl loop is created | @977552154 16:14:22.906 [main] DEBUG org.springframework.context.support.StaticApplicationContext - Refreshing org.springframework.context.support.StaticApplicationContext@58651fd0 16:14:22.953 [main] INFO ai.platon.pulsar.context.PulsarContexts - Active context | ai.platon.pulsar.context.support.StaticPulsarContext#1 16:14:23.985 [main] INFO ai.platon.pulsar.persist.gora.GoraStorage - Backend data store: FileBackendPageStore realSchema: FileBackendPageStore 16:14:24.112 [main] INFO ai.platon.pulsar.persist.AutoDetectStorageProvider - Storage is created: class ai.platon.pulsar.persist.gora.FileBackendPageStore realSchema: FileBackendPageStore 16:14:24.188 [main] INFO ai.platon.pulsar.crawl.component.LoadComponent.Task - 3. 💔 💿 U got 1600 0 <- 0 in , fc:1 ProtoNotFound(1600) | https://www.amazon.com/Best-Sellers/zgbs -outLinkSelector a[href~=/dp/] 16:14:24.188 [main] INFO ai.platon.pulsar.crawl.component.LoadComponent.Task - Log explanation: https://github.com/platonai/pulsarr/blob/master/docs/log-format.adoc 16:14:24.307 [main] INFO ai.platon.pulsar.crawl.impl.StreamingCrawlLoop - Registered 15 link collectors | loop#1 @977552154 [] 16:14:24.330 [SpringContextShutdownHook] DEBUG org.springframework.context.support.StaticApplicationContext - Closing org.springframework.context.support.StaticApplicationContext@58651fd0, started on Sun Jun 25 16:14:22 CST 2023 16:14:24.330 [Thread-0] INFO ai.platon.pulsar.context.support.AbstractPulsarContext - Closing context #1/2 | StaticPulsarContext 16:14:24.331 [Thread-0] INFO ai.platon.pulsar.session.AbstractPulsarSession - Session is closed | #1000002 16:14:24.331 [Thread-0] INFO ai.platon.pulsar.session.AbstractPulsarSession - Session is closed | #1000001 16:14:24.331 [DefaultDispatcher-worker-1] INFO ai.platon.pulsar.crawl.impl.StreamingCrawler - Starting StreamingCrawler #1 ...

Process finished with exit code 0

提示协议未找到,大概什么原因

galaxyeye commented 10 months ago

Can not reproduce.