magnusmanske / listeria_rs

Repo for the Wikimedia Listeria bot
https://listeria.toolforge.org/
25 stars 6 forks source link

Too long Listeria list can truncate the whole page, including the `{{Wikidata list end}}` template #118

Open lucaswerkmeister opened 6 months ago

lucaswerkmeister commented 6 months ago

While looking through pages with {{Wikidata list}} but not {{Wikidata list end}} (trying to find other occurrences of #108), I noticed that several pages’ source code appears to be truncated:

$ curl -s 'https://www.wikidata.org/wiki/Wikidata:Database_reports/Identified_duplicates?action=raw' | tail; echo
| 6
|-
| [[Q21939515|Mount Bundarbo]]
| [[Q8502|mountain]]<br/>[[Wikidata:Database reports/Identified duplicates|Wikimedia duplicated page]]
| [[Q21915081|Bundarbo Mountain]]
| 
| 7
|-
| [[Q22421141|Mount Cautley]]
| [[Q8502|mountain]]<br/>[[Wikidata:Database reports%
$ curl -s 'https://www.wikidata.org/wiki/Wikidata:TED/TED_speakers?action=raw' | tail; echo
| [[Q30|United States of America]]
| [https://www.ted.com/speakers/alex_steffen alex_steffen]<br/>[https://www.ted.com/speakers/74 74]
| [[Q4717834|Q4717834]]
|-
| [[File:Alex Tabarrok speaking at TED in 2009.jpg|center|128px]]
| [[Q4717865|Alex Tabarrok]]
| Canadian economist
| [[Q188094|economist]]
| 1966<ref name='ref_08349cff5d4d801b9ce21fb416c71762'>[[Q36578|Integrated Authority File]]</ref><ref name='ref_3b0404a6206a3f8611a7f1155e837094'>[[Q11789729|NUKAT]]</ref>
| [[Q6581097|male]]<ref name='ref_ab66662

If the full list is too long for a wiki page, Listeria should probably produce some kind of error, or perhaps truncate the number of entries in the table; but the page should not be truncated at the wikitext level, as this results in broken syntax and also destroys content after {{Wikidata list end}}, which the bot shouldn’t touch.

(Note: I’m not sure if the truncation happens in Listeria itself or server-side in MediaWiki.)

pere-prlpz commented 2 months ago

You can solve this by using a query that doesn't yield such a long list. A quite expeditious way of truncating the list is to use LIMIT parameter in the query (for example LIMIT 1500).