petabyte-research / redflags

Automagically checks and filters risky public procurements
http://www.redflags.eu/
Apache License 2.0
4 stars 4 forks source link

Is scraping Polish data also checking Hungarian ones? #19

Closed KrzysztofMadejski closed 7 years ago

KrzysztofMadejski commented 7 years ago

I've started the script and I see the following output:

2016-09-23 15:10:34,267 [WARN ] [ pool-2-thread-1] h.p.redflags.engine.tedintf.TedInterface > Failed to download 2881077:HU:DATA - INVALID_UDL (canRetryHelp: false, canContinueCrawling: true)
2016-09-23 15:10:34,267 [DEBUG] [ pool-2-thread-1] h.p.redflags.engine.gear.GearTrain       > Notice 2881077 dropped by CountryFilter

Is the engine trying to load every hungarian tender and then drops all later in the gear chain? Seems like a waste of processing..

dzierzy commented 7 years ago

did you start with epf scope? tedinterface shouldn't be in use at all

KrzysztofMadejski commented 7 years ago

I've started it with epf scope: redflags-auto.log

2016-09-24 @ 16:17.51   Starting engine (java -jar /home/redflags/redflags/redflags-engine/target/redflags-engine-1.1.0-SNAPSHOT.jar  --scope=epf --cache=/home/redflags/redflags/cache >/home/redflags/redflags/automation-logs/2016-09-24-16-17.log)
2016-09-24 @ 16:28.27   Backuping users and filters (/home/redflags/redflags/backups/users-filters-20160924-162827.sql)
2016-09-24 @ 16:28.27   Calling SQL script (/home/redflags/redflags/redflags-helper-tables.sql)
2016-09-24 @ 16:28.27   Sending filter emails (wget -qO- http://.../send-filter-emails?secret=...)
2016-09-24 @ 16:28.27   Session ended

And execution log:

2016-09-24 16:17:53,310 [INFO ] [            main] h.p.redflags.engine.RedflagsEngineApp    > *** REDFLAGS ENGINE - Initializing framework
2016-09-24 16:17:54,701 [INFO ] [            main] h.p.redflags.engine.RedflagsEngineApp    > Starting RedflagsEngineApp v1.1.0-SNAPSHOT on tcee with PID 28575 (/home/redflags/redflags/redflags-engine/target/redflags-engine-1.1.0-SNAPSHOT.jar started by redflags in /home/redflags/redflags/redflags-engine)
2016-09-24 16:17:54,701 [DEBUG] [            main] h.p.redflags.engine.RedflagsEngineApp    > Running with Spring Boot v1.3.0.RELEASE, Spring v4.2.3.RELEASE
2016-09-24 16:17:54,701 [INFO ] [            main] h.p.redflags.engine.RedflagsEngineApp    > No profiles are active
2016-09-24 16:17:56,822 [INFO ] [            main] h.p.r.e.gear.archiver.pl.EpfArchiver     > data source connected
2016-09-24 16:17:57,461 [INFO ] [            main] h.p.redflags.engine.boot.GearLoader      > Loading gears
2016-09-24 16:17:57,532 [INFO ] [            main] h.p.redflags.engine.boot.GearLoader      > 60/60 gears loaded
2016-09-24 16:17:57,533 [INFO ] [     rf-engine-0] h.p.r.engine.boot.RedflagsEngineSession  > Starting Redflags engine session with scope hu.petabyte.redflags.engine.epforgpl.EPFScope@342642b0 (HEAP: 164M)
2016-09-24 16:17:57,547 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.parser.MetadataParser  > Parsing language is HU
2016-09-24 16:17:57,547 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.filter.CountryFilter   > Country filter: HU
2016-09-24 16:17:57,547 [DEBUG] [     rf-engine-0] h.p.r.e.g.filter.OriginalLanguageFilter  > Original language filter: HU
2016-09-24 16:17:57,548 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.archiver.Archiver      > Will archive notices in display languages: [HU, EN]
2016-09-24 16:17:57,548 [DEBUG] [     rf-engine-0] h.p.r.e.gear.parser.DocFamilyFetcher     > Parsing language is HU
2016-09-24 16:17:57,549 [DEBUG] [     rf-engine-0] h.p.r.e.g.filter.PublicationDateFilter   > Notices published before 2012-07-01 will be dropped
2016-09-24 16:17:57,549 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.filter.DirectiveFilter > Directive filter: .*2004/18/.*
2016-09-24 16:17:57,549 [DEBUG] [     rf-engine-0] h.p.r.e.g.p.TemplateBasedDocumentParser  > Parsing language is HU
2016-09-24 16:17:57,549 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.parser.RawValueParser  > Parsing language is HU
2016-09-24 16:17:57,581 [DEBUG] [     rf-engine-0] h.p.r.e.g.i.h.ContrDescCartellingIndicator > Loaded 114 expressions for cartelling
2016-09-24 16:17:57,645 [INFO ] [     rf-engine-0] h.p.redflags.engine.boot.GearLoader      > Loading gears
2016-09-24 16:17:57,690 [INFO ] [     rf-engine-0] h.p.redflags.engine.boot.GearLoader      > 60/60 gears loaded
2016-09-24 16:17:57,690 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.export.FlagExporter    > We have 30 indicators
2016-09-24 16:17:57,690 [INFO ] [     rf-engine-0] h.p.redflags.engine.boot.GearLoader      > Loading gears
2016-09-24 16:17:57,691 [INFO ] [     rf-engine-0] h.p.redflags.engine.boot.GearLoader      > 60/60 gears loaded
2016-09-24 16:17:57,691 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.export.FlagExporter    > We have 8 indicators
2016-09-24 16:17:57,691 [DEBUG] [     rf-engine-0] h.p.r.engine.gear.export.MySQLExporter   > MySQL Exporter is off, start app with --db=1 option to turn it on
2016-09-24 16:17:57,691 [INFO ] [     rf-engine-0] h.p.redflags.engine.epforgpl.EPFScope    > loading procurements. Page number: 1
2016-09-24 16:17:57,691 [INFO ] [     rf-engine-0] h.p.redflags.engine.epforgpl.EPFData     > fetching procurement list
2016-09-24 16:17:57,691 [INFO ] [     rf-engine-0] h.p.redflags.engine.epforgpl.Connector   > fetching data from zamowienia_publiczne/?limit=10000&page=1
2016-09-24 16:17:58,967 [INFO ] [            main] h.p.redflags.engine.RedflagsEngineApp    > Started RedflagsEngineApp in 5.423 seconds (JVM running for 7.192)
2016-09-24 16:18:00,328 [INFO ] [     rf-engine-0] h.p.redflags.engine.epforgpl.EPFScope    > processing notice 2884289, 1 procurement in row

[...]
2016-09-24 16:28:25,544 [INFO ] [ pool-2-thread-1] h.p.redflags.engine.gear.GearTrain       > Processing notice 2876125 (HEAP: 193M)
2016-09-24 16:28:27,635 [WARN ] [ pool-2-thread-1] h.p.redflags.engine.tedintf.TedInterface > Failed to download 2876125:HU:DATA - INVALID_UDL (canRetryHelp: false, canContinueCrawling: true)
2016-09-24 16:28:27,635 [DEBUG] [ pool-2-thread-1] h.p.redflags.engine.gear.GearTrain       > Notice 2876125 dropped by CountryFilter
2016-09-24 16:28:27,635 [INFO ] [     rf-engine-0] h.p.r.engine.gear.export.FlagExporter    > Nothing to export
2016-09-24 16:28:27,635 [INFO ] [     rf-engine-0] h.p.r.engine.gear.export.FlagExporter    > Nothing to export
2016-09-24 16:28:27,636 [INFO ] [     rf-engine-0] h.p.r.engine.boot.RedflagsEngineSession  > Finished Redflags engine session with scope hu.petabyte.redflags.engine.epforgpl.EPFScope@342642b0 (HEAP: 193M)
2016-09-24 16:28:27,636 [INFO ] [     rf-engine-0] h.p.r.engine.boot.RedflagsEngineSession  > Processed 300 notices in 10 minutes 30 seconds 88 milliseconds
2016-09-24 16:28:27,640 [INFO ] [    main-destroy] h.p.r.engine.boot.RedflagsEngineBoot     > *** REDFLAGS ENGINE - Stopping
KrzysztofMadejski commented 7 years ago

Maybe redflags.engine.gears should be changed in application.yml?

dzierzy commented 7 years ago

That's right, application.yml was on ignored files list. Should be working now.

KrzysztofMadejski commented 7 years ago

Chodzi!