Closed jwzimmer-zz closed 3 years ago
https://github.com/jwzimmer/tv-tropes/commit/830c4c4df1c6a7f6a2f1b8dd90a6e1ee6831708a
My script seems to be doing what we want, BUT when I tried to run it on every trope html file in trope_list it froze my computer.
Well, https://github.com/jwzimmer/tv-tropes/blob/main/individualtropepage.py seems to work, but as mentioned above, running it on all the tropes kills my computer. So I tried running it just on Z tropes - ok, no problem. Then just on A tropes - computer froze. So I think we have all the Z trope article's linked tropes, but just some of the As, and an unknown portion of the rest. Better or more efficient strategy may be required @nguyenhphilip ideas?
Example dict: https://github.com/jwzimmer/tv-tropes/blob/main/linked_trope_dict_from_ADayInTheLimelight.json
Done! Verified that I stepped through all the files in the trope_list folder:
len(it.alltropes) == it.count True
Only 1 error... I will open a case for handling whatever happened there.
Done except for issue with one page: https://github.com/jwzimmer/tv-tropes/issues/11
Should also randomly spot-check some pages to make sure the dicts are correct. I can do that tmrw.
Oh nice you got it to work! I saw the comment from 2 hrs ago and started running a script of my own.. then I just saw that you figured it out haha. Looking at issue 11 it looks similar to an issue I had, which is that some <a>
links don't have the 'href
' attribute, meaning they don't link to any page. The way I filtered those out was by using if trope.has_attr('href')
. Anyways it might be good just to cross reference our lists
Yes! That sounds great! : )
Making one list instead of a million files. Good call. I should have done that!
Haha we'll see how it works out when we load things into a data frame! I think it should be fine since I've loaded ~300mb files into python though : )
I think what we've done so far counts as "start making some kind of something".
Next big step is to start parsing the individual trope articles for connections.