Closed drammock closed 4 years ago
@drammock -- sorry didn't even see this until I made PR #249. Note once we merge this, I'll go back and regen the data for that PR.
I ran this PR on my fork and the CSV file doesn't change, but the Rdata object does -- perhaps due to serialization.
See comment on the ordering.
My only other concern about the Rdata object is the rownames, which are now ascending, but not by increments of one, e.g.:
InventoryID Glottocode ISO6393
1 1 kore1280 kor 3282 1 kore1280 kor 6976 1 kore1280 kor 7265 1 kore1280 kor 7760 1 kore1280 kor 9328 1 kore1280 kor 11681 1 kore1280 kor 13988 1 kore1280 kor 16540 1 kore1280 kor 17738 1 kore1280 kor 18256 1 kore1280 kor
Should we reset the rownames before dumping to Rdata?
Should we reset the rownames before dumping to Rdata?
Yeah, we should do that.
OK @bambooforest if you're cool with sorting by Phoneme instead of GlyphID, this should be ready to go. If not, LMK how you think sort should be handled.
(note in case we need it: R functions strtoi()
and as.hexmode()
)
@drammock - thanks! I'm ok with just sorting on phonemes for diff's sake. I think ideally (or perhaps just linguistically we could sort on "the IPA chart order", e.g. p, b, t, k, ... ), but there's no straightforward way (off the top of my head) to do this. I do like your consonants < tones | vowels though.
We could approximate IPA order by sorting on feature columns. But to me that seems like overkill; it would require a lot of nested sorting. I'm content with just getting it to be deterministic, which it should be now.
-------- Original Message -------- On Feb 21, 2020, 09:35, Steven Moran wrote:
@drammock - thanks! I'm ok with just sorting on phonemes for diff's sake. I think ideally (or perhaps just linguistically we could sort on "the IPA chart order", e.g. p, b, t, k, ... ), but there's no straightforward way (off the top of my head) to do this. I do like your consonants < tones | vowels though.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This should (hopefully) eliminate spurious git diffs caused by different row order, when regenerating aggregate data in each pull request.