phoible / dev

PHOIBLE data and development.
https://phoible.org/
GNU General Public License v3.0
121 stars 30 forks source link

sort before saving #247

Closed drammock closed 4 years ago

drammock commented 5 years ago

This should (hopefully) eliminate spurious git diffs caused by different row order, when regenerating aggregate data in each pull request.

bambooforest commented 4 years ago

@drammock -- sorry didn't even see this until I made PR #249. Note once we merge this, I'll go back and regen the data for that PR.

I ran this PR on my fork and the CSV file doesn't change, but the Rdata object does -- perhaps due to serialization.

See comment on the ordering.

My only other concern about the Rdata object is the rownames, which are now ascending, but not by increments of one, e.g.:

  InventoryID Glottocode ISO6393

1 1 kore1280 kor 3282 1 kore1280 kor 6976 1 kore1280 kor 7265 1 kore1280 kor 7760 1 kore1280 kor 9328 1 kore1280 kor 11681 1 kore1280 kor 13988 1 kore1280 kor 16540 1 kore1280 kor 17738 1 kore1280 kor 18256 1 kore1280 kor

Should we reset the rownames before dumping to Rdata?

drammock commented 4 years ago

Should we reset the rownames before dumping to Rdata?

Yeah, we should do that.

drammock commented 4 years ago

OK @bambooforest if you're cool with sorting by Phoneme instead of GlyphID, this should be ready to go. If not, LMK how you think sort should be handled.

drammock commented 4 years ago

(note in case we need it: R functions strtoi() and as.hexmode())

bambooforest commented 4 years ago

@drammock - thanks! I'm ok with just sorting on phonemes for diff's sake. I think ideally (or perhaps just linguistically we could sort on "the IPA chart order", e.g. p, b, t, k, ... ), but there's no straightforward way (off the top of my head) to do this. I do like your consonants < tones | vowels though.

drammock commented 4 years ago

We could approximate IPA order by sorting on feature columns. But to me that seems like overkill; it would require a lot of nested sorting. I'm content with just getting it to be deterministic, which it should be now.

-------- Original Message -------- On Feb 21, 2020, 09:35, Steven Moran wrote:

@drammock - thanks! I'm ok with just sorting on phonemes for diff's sake. I think ideally (or perhaps just linguistically we could sort on "the IPA chart order", e.g. p, b, t, k, ... ), but there's no straightforward way (off the top of my head) to do this. I do like your consonants < tones | vowels though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.