notDavidsGit / WikiFirstLinkNetworkVideo

The wikipedia first link network for the Not David youtube video
4 stars 0 forks source link

Please open source the code #1

Open henrywintif opened 1 month ago

henrywintif commented 1 month ago

Hi I just watched your video and I'm interested in researching whether the network phenomena you observed on English Wikipedia apply to other languages.

If you could please open source the code you used to create the video that would be amazing.

Attribution to your work will of course be provided.

notDavidsGit commented 1 month ago

Hi,

hopefully this reaches you, I've never replied to a github message haha.

Unfortunately because this code is essentially a web scraper it is too easily open to abuse, even if unintentionally or accidentally. As such I don't feel comfortable sharing the code (though I shared the network so that people can compare with wikipedia to verify).

That being said:

a) if all you want to do is to verify whether or not it works in other languages, thats actually pretty easy to check without the code just by going to wikipedia and clicking some random links.

and

b) if you really really want to use a code to do it, writing a code like this is actually fairly easy, with the hardest part just being the logic to make sure you're getting the first link, but thats just some 'if' statments. I really had no experience with this type of programming so even if you have a slightest bit of python knowledge (or really any programming language), you should be able to get the jist of the code in a day, and then another day or two to get the logic down.

in either case the big question is how valid is the interpretation. Many non-english (and especially non-european) languages have fairly small wikipedias. Many of those also include pages that are mostly translations of their english equivalent. This introduces a lot of biases that can bias the conclusions (I'm not saying that to discourage you from analyzing it, just keep it in mind when you go to do it).

Thanks for reaching out, Not David

On Sat, Jul 20, 2024 at 1:37 PM Henry Wintif @.***> wrote:

Hi I just watched your video and I'm interested in researching whether the network phenomena you observed on English Wikipedia apply to other languages.

If you could please open source the code you used to create the video that would be amazing.

Attribution to your work will of course be provided.

— Reply to this email directly, view it on GitHub https://github.com/notDavidsGit/WikiFirstLinkNetworkVideo/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ33DRNW4GPIMXN3LCE6LKDZNK36XAVCNFSM6AAAAABLGFVVHCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZDCMBTGAYDKOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

henrywintif commented 1 month ago

Thanks for the response! I understand your concern. Would you be willing to make a private repo so my methodology would follow yours precisely? I'm also wondering if it's possible to download all of Wikipedia. I've seen torrents with the entirety before. Would make the process quicker I think because requests could be straight from memory

notDavidsGit commented 1 month ago

I'm not entirely certain what you mean by 'make a private repo so my methodology would follow yours precisely' but if that involves sharing the code then I still don't feel comfortable given the potential for abuse. Luckily it's fairly easy to see if the code you make is working as you can just check if it's returning the wikipedia articles you expect.

If you mean the code for the analysis itself and not gathering the data -- all of that was done in Gephi, which is a free (and open source) network visualization tool which also just has built in functions in its GUI to allow you to calculate things like betweeness centrality and a bunch of other things. There is essentially no code for me to share for the analysis of the data since it was all just done there. Making the data compatible with Gephi is also really straightforward because there is a built in function in the NetworkX python module which is a standard network analysis module (the function is write_gexf()).

And yes, it is possible to download all of wikipedia and that would certainly make your code much much faster and safer than the one I wrote. Sadly I didn't know that prior to writing my code. If I could do the project again I'd write the code to have done it off of that. Unfortunately I don't know exactly how or where to get those downloads.

On Sat, Jul 20, 2024 at 4:27 PM Henry Wintif @.***> wrote:

Thanks for the response! I understand your concern. Would you be willing to make a private repo so my methodology would follow yours precisely? I'm also wondering if it's possible to download all of Wikipedia. I've seen torrents with the entirety before. Would make the process quicker I think because requests could be straight from memory

— Reply to this email directly, view it on GitHub https://github.com/notDavidsGit/WikiFirstLinkNetworkVideo/issues/1#issuecomment-2241316799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ33DRJEZ3K7AO3HFMLZUP3ZNLP43AVCNFSM6AAAAABLGFVVHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRGMYTMNZZHE . You are receiving this because you commented.Message ID: @.***>