magnusmanske / petscan_rs

The repo for the PetScan tool
https://petscan.wmflabs.org/
GNU General Public License v3.0
44 stars 10 forks source link

PetScan often gives no result (issue II) #148

Open JotaCartas opened 9 months ago

JotaCartas commented 9 months ago

Still appends - an Intermittent "No result for source categories" but no inclusion of "®exp_filter" on an random field.

image

PetScan gives this error during about 5 minutes, then it works again for a few minutes, then it gives the same error again in any simple query See also #144"

the-it commented 9 months ago

I see similar effects. I run a bot, which uses PetScan with the API interface. Since some time now I see often errors. The request returns with 200, but doesn't contain any results.

JotaCartas commented 9 months ago

Schedule of occurrences: maybe it helps (Time zone info for Lisboa. UTC +0. Western European Time (WET) 2023/12/25 - 08:37 circa - Error 2023/12/25 - 08:45:04 - Running again OK 2023/12/25 - 08:49:10 - Error 2023/12/25 - 09:00:14 - Running again OK 2023/12/25 - 09:50 circa- Error 2023/12/25 - 10:02 circa - Running again OK 2023/12/25 - 10:05:07 - Error 2023/12/25 - 10:14 circa - Running again OK 2023/12/25 - 11:30 circa - Error

kaubu commented 9 months ago

Is happening to me right now, too. It was working fine for 10 minutes or so, then just suddenly stopped working.

Edit: Not sure what happened, but I wait like 10 minutes and it worked, which is actually in line with OP's post. It seems the service intermittently stops working.

ghost commented 9 months ago

first didn't work for me too, but if I take the filled form with automatic execution it works again

https://petscan.wmflabs.org/?search_max_results=500&outlinks_no=&namespace_conversion=keep&sortby=none&show_soft_redirects=both&negcats=&sparql=&templates_yes=&interface_language=de&categories=Lied+2000&output_compatability=catscan&minlinks=&depth=3&outlinks_yes=&sortorder=ascending&maxlinks=&max_sitelink_count=&page_image=any&search_wiki=&before=&cb_labels_any_l=1&referrer_name=&ores_type=any&rxp_filter=&combination=subset&language=de&search_filter=&sitelinks_yes=&project=wikipedia&format=html&labels_yes=&links_to_any=&wikidata_prop_item_use=&output_limit=&min_sitelink_count=&sitelinks_no=&cb_labels_yes_l=1&cb_labels_no_l=1&ores_prob_to=&ores_prediction=any&doit=&interface_language=de

maximmasiutin commented 9 months ago

I had this error too very often.

ArztKlein commented 9 months ago

I have the same problem. It works for 5 minutes, stops, then works again with the exact same queries.

maximmasiutin commented 9 months ago

Did somebody manage to install petscan on another instance, i.e. on own server machine? I tried, it worked, but always returned this error. I did not know how to debug this error, for example, how to enable full logging of URL requests and replies.

Do you know how to enable request logging?

Do you know how to properly install petscan on own server? There was no even database schema published, so I had to guess columns.

maximmasiutin commented 9 months ago

@magnusmanske - can you please help in writing small step-by-step instructions on how to install petscan on own server, so we could see what happens? I installed it, but it gives this error (no result for source categories) always, whereas the petscan from https://petscan.wmflabs.org/ gives this error from time to time.

1-Byte commented 9 months ago

@maximmasiutin The following steps might help you to setup a local environment:

Setup local MySQL database on port 3308

CREATE TABLE `query` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `querystring` longtext DEFAULT NULL,
  `created` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `started_queries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `querystring` longtext DEFAULT NULL,
  `created` varchar(100) DEFAULT NULL,
  `process_id` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

Create config.json

{
  "host": "127.0.0.1",
  "user": "<localuser>",
  "password": "<localpassword>",
  "schema": "petscan",
  "http_port": 8000,
  "timeout": 30000,
  "restart-code": "",
  "mysql": [
    [
      "<u1111>",
      "<replicapassword>"
    ]
  ]
}

Forward replicas

ssh toolforge -L 3306:XXX.analytics.db.svc.wikimedia.cloud:3306 -L 3309:wikidatawiki.analytics.db.svc.wikimedia.cloud:3306

XXX: wiki to be queried (e.g. commonswiki)

Start server

cargo run
ghost commented 8 months ago

Is anyone working on this problem? Because Petscan is otherwise no longer usable.

magnusmanske commented 8 months ago

I have a lot of things on my plate but I am switching my attention to PetScan for now. Expect some fiddling and possible temporary breakage. Can someone please confirm the "®exp_filter" thing is still happening, because I tried to get rid of that recently.

magnusmanske commented 8 months ago

Debugging output revealed that max_user_connections is exhausted because of too many (sub)queries. If you lower https://petscan.wmflabs.org/?since_rev0=&language=commons&edits%5Bflagged%5D=both&search_max_results=500&cb_labels_no_l=1&edits%5Bbots%5D=both&%C2%AEexp_filter=&ns%5B6%5D=1&langs_labels_no=%C2%AEexp_filter&cb_labels_yes_l=1&cb_labels_any_l=1&edits%5Banons%5D=both&project=wikimedia&interface_language=en&negcats=Diagrams%20of%20road%20signs%0AMonochrome%20photographs&categories=Speed%20bumps%0A&depth=100& from 100 to 10 it works just fine. I will try to limit the number of queries.

magnusmanske commented 8 months ago

I have added a better error message, plus it now terminates quickly: https://petscan.wmflabs.org/?output_limit=&links_to_all=&edits%5Bflagged%5D=both&depth=100&ores_prob_from=&sitelinks_yes=&search_filter=&active_tab=tab_categories&sparql=&wikidata_item=no&max_age=&min_redlink_count=1&search_wiki=&cb_labels_any_l=1&interface_language=en&wikidata_source_sites=&cb_labels_no_l=1&labels_no=&outlinks_any=&smaller=&categories=Speed+bumps%0D%0A&langs_labels_no=%C2%AEexp_filter&format=html&wikidata_prop_item_use=&larger=&show_disambiguation_pages=both&cb_labels_yes_l=1&search_max_results=500&sortorder=ascending&templates_any=&language=commons&edits%5Banons%5D=both&edits%5Bbots%5D=both&subpage_filter=either&negcats=Diagrams+of+road+signs%0D%0AMonochrome+photographs&max_sitelink_count=&wikidata_label_language=&outlinks_yes=&links_to_no=&project=wikimedia&ns%5B6%5D=1&common_wiki=auto&doit=

maximmasiutin commented 8 months ago

Thank you! I used the latest version of petscan_rs sources from the repository, and now I'm getting proper error messages, such as:

2024-01-24T08:51:56.772295Z ERROR run:get_wiki_db_connection{wiki="enwiki"}: petscan_rs::app_state: error=Io(Io(Custom { kind: Uncategorized, error: "failed to lookup address information: Name or service not known" }))
2024-01-24T08:51:56.772437Z  INFO run: petscan_rs::platform: error=Io(Io(Custom { kind: Uncategorized, error: "failed to lookup address information: Name or service not known" }))
Platform::get_response: No result

I get this when I run from command line as you suggested:

cargo run -- "language=en&project=wikipedia&dept...

Anyway, now it properly displays error messages, thank you very much for the information that you provided in the README file on how to run it from the command line!

I didn't yet configured the wmflabs account, as you mentioned in the README, so this is probably the cause of the error message.