Closed dagingaa closed 10 months ago
Mostly because regex is utterly unreadable, here's an explanation curtesy of ChatGPT:
Explanation:
<revision>
: This matches the start of the <revision>
tag.[\s\S]*?
: This matches any character including new lines ([\s\S]
), as many times as possible but as few as needed (non-greedy, due to *?
). This ensures that the regex searches within the content of the <revision>
tag.<id>
: This matches the start of the <id>
tag within the <revision>
tag.(\d+)
: This is a capturing group that matches one or more digits (\d+
). This represents the id number.</id>
: This matches the end of the <id>
tag.this is spectacular. Thank you. I've put this on dev branch, so it can make the next release, which should be in a few days. I've added a typescript support for the new method, feel-free to document things, as you see fit. cheers!
just kidding - this is released in 10.2.0
will get to updating dumpster-dip this week. thanks for the help!
hey, could we also grab revisionID from the api when we do a fetch? @MarketingPip - wanna take a crack at it? this is a cool feature. cheers
@spencermountain - sure can.
I don't think this messes anything up but - wanna take a look see?
https://en.wikipedia.org/w/api.php?action=query&prop=revisions%7Cpageprops&rvprop=content|ids&maxlag=5&rvslots=main&origin=*&format=json&redirects=true&titles=Toronto_Raptors
Note: the ids
prop added for reference in future. I will make PR in advanced, run some texts and see what else you want to grab. I will get rev / parent id. And do you want an option to search via rev id as well?
@spencermountain - I got most of the work done for getting revisionID. I will let you make / do the work for making the query for looking for specific revision via query. (if you decide you will support that).
That said - in a junk / play branch. I modified the test / expected results for the Italian and CSGO wikipedia, tho - I am afraid this will cause issues when you go to build in future when a revision changes and not the same. Let me know how you want me to modify the test & I will submit tomorrow or the next day etc..
ah, perfect. yeah, that's great. Are you thinking of this?
wtf('Fubar', {revisionID: '372618'})
to fetch an older version? never though of that - that would be cool. As long as it doesn't get really complicated - Go for it!
thanks for your help
ah, perfect. yeah, that's great. Are you thinking of this?
wtf('Fubar', {revisionID: '372618'})
to fetch an older version? never though of that - that would be cool. As long as it doesn't get really complicated - Go for it!
thanks for your help
@spencermountain - I am grabbing current revision ID (but I will see about grabbing a previous version if it doesn't get messy).
This change adds initial support for revisionID as passed in through options. This is useful because one can use this to check for revision changes between two wikipedia dumps, like when using dumpster-dip on a monthly basis to keep a search database up-to-date (for RAG for example).
Mostly I just missed having this, and I plan to submit a follow-up PR to dumpster-dip to have it parse the revisionID and pass it in so I can use it.
Note that this change does not include updating the README and types yet, I will do that, but I wanted to wait for feedback on naming etc. first.