Closed nh36 closed 2 years ago
Hey Mattis, Nice to (virtually) meet you! My name is Ash and I work on the hanproject. Basically, what this issue is about is that I wrote code to try and recreate what you did in your 2017 paper. The number of nodes I came up with (1809) was different than the number you list in the paper (1845). Later, I looked at the results of my code vs. results from poepy. Both came up to 1809, but actually, they weren't the exact same 1809. They were off by 3 nodes or so, and I figured out where the difference came from for 2 of those nodes. The last one I just left in the interest of time, since it probably didn't represent a fundamental misunderstanding on my part of what you were doing in the 2017 paper. I'll be using poepy going forward. I just did this whole thing as an experiment to better understand your work in specific, and applying Network theory to rhymes in general.
Ash
On Sat, 26 Mar 2022 at 12:56, Johann-Mattis List @.***> wrote:
Node numbers, what is the problem here?
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1079688883, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PUJYI2U7RQ6B6OPJITVB4CNTANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
Hi Ash, nice to meet you too! There is a problem with PoePy in so far as I need to do a thorough check of the code base. So if you use PoePy for these experiments, may I ask you to share the code with me, so I double-check and make sure to also fix bugs from poepy for this purpose? Cases like the one you mention here deserve investigation my side, as they should actually not turn up, and if there are errors in the code underlying older papers, we need to address them and fix them.
Did you use the exact data and compare with the workflow that was published for the 2017 paper, or did you use a new version of Baxter's 1992 data? This may also lead to discrepancies.
I just also gave you write access to poepy. So we can work on this together, and you'd of course be listed as an author, if new ideas come up, if we learn about bugs, and decide to make the code base better tested and userfriendly.
Hey Mattis, Yeah, that sounds great! I didn't find any errors in Poepy per se. I could go back and look at the 3 different chars (for the Nodes) sometime next week and report the differences here (it's also likely that it's my code that has the issue). One thing in regards to the calculation of edges is that Poepy produces a lot (in the case of the entire Shijing, 300 something) cases where it reports an edge with the same character on both sides of the edge. I looked at the Shijing data (btw, I'm using your Baxter1992.tsv file to answer your earlier question) for those cases, and in some cases couldn't find a rhyme, though I figured it may have something to do with the "potential" rhymes mentioned in the 2017 paper (also, I've been working off of the 2016 pre-published version, so some of the issue -- as far as node numbers, etc., -- may be there if the two papers aren't exactly the same). I can also send you those pairs of self-rhyming characters sometime. Thanks for the prompt response!
Ash
On Sat, 26 Mar 2022 at 15:11, Johann-Mattis List @.***> wrote:
I just also gave you write access to poepy. So we can work on this together, and you'd of course be listed as an author, if new ideas come up, if we learn about bugs, and decide to make the code base better tested and userfriendly.
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1079713032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PURVBCVDUQDKZYC3ZDVB4SLJANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
For starters, if you send me the exact baxter file you use and your code, I'd have a look next week, when I find time.
I'd then also check my code from 2017, which is in fact something outdated that should be re-done, also since I introduce a new way to measure rhymes now, which only considers adjacent rhymes, which I think is much better and should be the new standard, an idea introduced to me by Aison Bu, although I do not know if this was exactly what Aison meant (see here: http://phylonetworks.blogspot.com/2020/08/constructing-rhyme-networks-from-rhymes.html).
So my suggestion would be to include this new code into PoePy and to make a checking of the self-nodes, which is obviously an error, although in a consecutive-rhyme-model you would have the possibility that an identical word IS used as a rhyme, although not considered as a stylistic master-piece.
Hey Mattis, I've pushed the file soas_rnetwork_test.py to hanproj/hanproject/. If you run the function create_network_for_baxter1992_data_4(), it calculates the nodes and edges using my code and PoePy. It also lists out the various groups of rhyme pairs mentioned below. Below, I describe my analysis of the differences in output between our respective code.
As far as the input data, my code uses the Baxter1992.tsv file that is on hanproj/hanproject/ . For PoePy, I can't use that same file. It complains that the code is expecting 11 columns and that there are only 10. I've attached a copy of the exact file I'm feeding to PoePy to this email.
Also, I haven't had a chance yet to look at the PoePy code. If you don't have the time, I can look into it and see if I can track down the anomalies.
Hope you are doing well!
Ash
The results: NODES: Our nodes are off by one. PoePy gives 1809. Mine is missing 慍 because my code isn't handling stanza 237.8 correctly.
5112 unique edges
5112 unique edges
However, my code skips (求, 休) from 9.1 because the rhyme is marked with '?', and Poepy includes it. So, they aren't exactly the same.
The detailed, complicated version: As for edges, PoePy has 5423 unique elements, while my code has 5968 elements (but only 5097 unique elements). As to the differences between my code and PoePy:
There are 331 elements in PoePy's edges that my code doesn't have. a. Of these 331, 307 are edges with the same character on each end. If you run my code it spits out a list to the screen, but here are 3 examples: (丁, 丁), (世, 世), (丸, 丸). These are probably doing no harm (given that all characters rhymes with themselves), but maybe bloating stats somewhere.
b. Of the remaining 24, Actual rhymes that my code is not handling correctly (15 + one skipped due to '?'): (休, 舟) from 1764, (休, 首), from 262.6, (公, 鍾) 242.5, (螽, 戎) 168.5, (忡, 戎) 168.5, (憂, 滺) 59.4, (禮, 濟) 290.1, (問, 慍) 237.8, (福, 載), (福, 備) these last two from 239.4, (盈, 旌), (鳴, 旌), (鳴, 驚), (旌, 驚) these are from 179.7, (鍾, 逢) 242.5 (求, 休) my code skips because it is marked '?' PoePy handles this one according to Baxter 1992 (休, 觩), but I have some doubts about Baxter's handling of this particular stanza.
I suspect PoePy is not handling these 5 pairs below correctly (so potentially the most interesting part of this exercise for you). I wrote code to collect a list of stanzas for each character, then compare stanzas. If the code said no stanzas matched, then I had it print out all of the stanzas that both characters in the pair appear in and checked via eyeball. For these 5 pairs, I couldn't find any matches: (遷, 安), (安, 山), (安, 丸), (安, 虔), (安, 梴)
c. Additionally, there are 7 pairs in my code that aren't in the PoePy results: (及, 濕), (憂, 休), (憂, 休), (憂, 休), (游, 休), (舟, 休), (首, 休) (及, 濕) is an actual rhyme in stanza 69.3, but doesn't appear in the PoePy results. The other 6 are all real rhymes, but they aren't really different. My (舟, 休) corresponds to your (休, 舟) for instance.
On Sat, 26 Mar 2022 at 15:32, Johann-Mattis List @.***> wrote:
For starters, if you send me the exact baxter file you use and your code, I'd have a look next week, when I find time.
I'd then also check my code from 2017, which is in fact something outdated that should be re-done, also since I introduce a new way to measure rhymes now, which only considers adjacent rhymes, which I think is much better and should be the new standard, an idea introduced to me by Aison Bu, although I do not know if this was exactly what Aison meant (see here:
http://phylonetworks.blogspot.com/2020/08/constructing-rhyme-networks-from-rhymes.html ).
So my suggestion would be to include this new code into PoePy and to make a checking of the self-nodes, which is obviously an error, although in a consecutive-rhyme-model you would have the possibility that an identical word IS used as a rhyme, although not considered as a stylistic master-piece.
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1079716846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PS72T7OPBE5UNUPL6DVB4UWZANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
Okay, attaching does not work with sending data to github, so I have not received the file. But the problems we face here tell me that we are running into different-file issues, not necessarily different-code issues.
What we need to do now is to make a unique fixed format for Baxter 1992 (which I should do) in the best version we can get. We can then update any errors we find there, and from THERE we can test if our codes handle things badly.
Hey Mattis, Yeah, I attached the file to the email in order to keep it out of the git repo. Having two versions of the same data file in the repo would not be good. But, I wanted you to have a copy of that file so you could test with it. I think having one unified version of the file as you suggest is the best solution. The only change I made to the file was for Ode 63.9, line 6, I changed 'b' to 'a' to match Baxter 1992. Just let me know when you have a unified version of the file and I'll re-run everything.
Ash
On Wed, 30 Mar 2022 at 06:20, Johann-Mattis List @.***> wrote:
Okay, attaching does not work with sending data to github, so I have not received the file. But the problems we face here tell me that we are running into different-file issues, not necessarily different-code issues.
What we need to do now is to make a unique fixed format for Baxter 1992 (which I should do) in the best version we can get. We can then update any errors we find there, and from THERE we can test if our codes handle things badly.
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1082632493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PXBVEZ5U4NFTVATONLVCPQATANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
Hey Mattis, At Nathan's request, I'm going to close this issue, but open up another one with a more appropriate name (since the number of nodes issue has been resolved). It'll be called "Troubleshooting discrepancies between PoePy and Ash's code" (if there is a name you would prefer over this one, just let me know).
Ash
On Wed, 30 Mar 2022 at 10:03, Ash Henson @.***> wrote:
Hey Mattis, Sorry, I may have misunderstood your email. Are you saying you weren't able to get the file because this email is going through GitHub? If that's the case, can you give me another email address to email it to. Or any other way I can get the data to you besides adding the file to the repo.
Thanks, Ash
On Wed, 30 Mar 2022 at 09:57, Ash Henson @.***> wrote:
Hey Mattis, Yeah, I attached the file to the email in order to keep it out of the git repo. Having two versions of the same data file in the repo would not be good. But, I wanted you to have a copy of that file so you could test with it. I think having one unified version of the file as you suggest is the best solution. The only change I made to the file was for Ode 63.9, line 6, I changed 'b' to 'a' to match Baxter 1992. Just let me know when you have a unified version of the file and I'll re-run everything.
Ash
On Wed, 30 Mar 2022 at 06:20, Johann-Mattis List < @.***> wrote:
Okay, attaching does not work with sending data to github, so I have not received the file. But the problems we face here tell me that we are running into different-file issues, not necessarily different-code issues.
What we need to do now is to make a unique fixed format for Baxter 1992 (which I should do) in the best version we can get. We can then update any errors we find there, and from THERE we can test if our codes handle things badly.
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1082632493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PXBVEZ5U4NFTVATONLVCPQATANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
Closing this issue since the number of nodes has been resolved. Will open another issue called "Troubleshooting discrepancies between PoePy and Ash's code" to continue the discussion of the edges.
Hey Mattis, Sorry, I may have misunderstood your email. Are you saying you weren't able to get the file because this email is going through GitHub? If that's the case, can you give me another email address to email it to. Or any other way I can get the data to you besides adding the file to the repo.
Thanks, Ash
On Wed, 30 Mar 2022 at 09:57, Ash Henson @.***> wrote:
Hey Mattis, Yeah, I attached the file to the email in order to keep it out of the git repo. Having two versions of the same data file in the repo would not be good. But, I wanted you to have a copy of that file so you could test with it. I think having one unified version of the file as you suggest is the best solution. The only change I made to the file was for Ode 63.9, line 6, I changed 'b' to 'a' to match Baxter 1992. Just let me know when you have a unified version of the file and I'll re-run everything.
Ash
On Wed, 30 Mar 2022 at 06:20, Johann-Mattis List @.***> wrote:
Okay, attaching does not work with sending data to github, so I have not received the file. But the problems we face here tell me that we are running into different-file issues, not necessarily different-code issues.
What we need to do now is to make a unique fixed format for Baxter 1992 (which I should do) in the best version we can get. We can then update any errors we find there, and from THERE we can test if our codes handle things badly.
— Reply to this email directly, view it on GitHub https://github.com/hanproj/hanproject/issues/9#issuecomment-1082632493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYII6PXBVEZ5U4NFTVATONLVCPQATANCNFSM5ROTFJVQ . You are receiving this because you were assigned.Message ID: @.***>
You have my emails from email conversation, right? Otherwise it is also online.
Node numbers, what is the problem here?