phoible / dev

PHOIBLE data and development.
https://phoible.org/
GNU General Public License v3.0
115 stars 30 forks source link

add Donno So Dogon #295

Closed bnk7 closed 4 years ago

bnk7 commented 4 years ago

I just noticed that the voiceless stops in Jamsay are aspirated as well. Should I make a new pull request to fix that once this one is merged?

drammock commented 4 years ago

I just noticed that the voiceless stops in Jamsay are aspirated as well. Should I make a new pull request to fix that once this one is merged?

Best practice is to do separate PRs for separate bugs or features ("feature" in the software sense) because it makes reviewing the PRs a lot easier. But in reality, many projects are tolerant of PRs with unrelated "I noticed this other problem along the way" changes getting lumped into open PRs because it's easier, as long as those unrelated changes are fairly small and easy to tell apart from the main changeset. So I'd say in this case it's OK to mix in those changes, since the inventories are on completely separate sets of lines in the CSV file (and therefore it's easy enough for the reviewer to tell which changes are which).

bambooforest commented 4 years ago

@bnk7 -- looks good, many thanks!

@drammock -- since the raw data were updated for raw-data/UW/uw_inventories.tsv, we need to regenerate the CSV file. i thought i was doing this directly to this pull request, but ends up i committed to @bnk7 's fork:

https://github.com/bnk7/dev/pull/1

what's best practice in this situation and how do you suggest we move forward (in case it happens again)? i figured committing the regen'd data here would be best and then squash and merge (instead of merging this PR and creating a new one just to regen the data).

thanks.

drammock commented 4 years ago

Option 1: revert the "add aspiration to Jamsay" commit and do that change + data regen yourself in a separate PR.

option 2: explain how to regen the data and get @bnk7 to do it locally and push a commit here (probably won't work if @bnk7 is using windows, if I recall).

Option 3: push your "regen data" commit directly to @bnk7's ds_branch. Should be possible if "allow edits from maintainers" was enabled on this PR

Option 4: PR into her ds_branch and wait for her to merge; when merged that commit will show up here. I can't tell for sure from the GitHub phone interface, but it looks like your earlier PR goes into her master branch instead of her ds_branch.

Regardless of which option you go with, you should explain what "regen the data" means and why it's necessary.

drammock commented 4 years ago

Also I think you know this and just made a typo, but it's the fact that the GM inventories file changed (not the UW one) that triggers the need to regen. The UW ones won't get incorporated yet since they don't have IDs assigned.

bambooforest commented 4 years ago

@bnk7 -- when we currently update (non UW) sources in the raw-data folder, we need to run two commands to regenerate the aggregated phoible.csv file. an explanation is here:

https://github.com/phoible/dev/tree/master/scripts

in short, from the command line i run (in the scripts folder):

to aggregate the raw data sources into a single file. and then:

to add the phonological features to that aggregated data.

the link above shows how to do it in an R program like RStudio.

@drammock mentioned this might not work on a Windows machine (if so, we should probably add why to the README above).

@bnk7 -- i will go ahead and merge this PR and then regenerate the data in a new PR. moving forward, if you can/want to regenerate the data, if you happen to change existing non-UW sources, please feel free to as part of the PR. if not or it's too much trouble, i'll do it.

since we don't aggregate in yet UW sources that you're adding (we need to update the aggregation script to include them), we don't need to regenerate data when adding UW sources.

thanks!

bnk7 commented 4 years ago

@bnk7 -- when we currently update (non UW) sources in the raw-data folder, we need to run two commands to regenerate the aggregated phoible.csv file. an explanation is here:

https://github.com/phoible/dev/tree/master/scripts

in short, from the command line i run (in the scripts folder):

  • Rscript aggregate-raw-data.R

to aggregate the raw data sources into a single file. and then:

  • Rscript add-features.R

to add the phonological features to that aggregated data.

the link above shows how to do it in an R program like RStudio.

@drammock mentioned this might not work on a Windows machine (if so, we should probably add why to the README above).

@bnk7 -- i will go ahead and merge this PR and then regenerate the data in a new PR. moving forward, if you can/want to regenerate the data, if you happen to change existing non-UW sources, please feel free to as part of the PR. if not or it's too much trouble, i'll do it.

since we don't aggregate in yet UW sources that you're adding (we need to update the aggregation script to include them), we don't need to regenerate data when adding UW sources.

thanks!

Okay, thanks for taking care of it this time!