phoible / dev

PHOIBLE data and development.
https://phoible.org/
GNU General Public License v3.0
115 stars 30 forks source link

Feature table -- Outdated -- missing 1000 segments #373

Open tang-kevin opened 5 months ago

tang-kevin commented 5 months ago

Dear Daniel,

I am looking for the feature table that covers all the segments on https://phoible.org/parameters (3,183 segments)

But the one on Github for download: https://github.com/phoible/dev/blob/master/raw-data/FEATURES/phoible-segments-features.tsv only has 2161 segments and it is missing basic segments such as "ɚ".

Perhaps it was not updated properly, as the other files in the same folder was all updated one year ago.

Many thanks, Kevin

drammock commented 4 months ago

I thought it should be possible on https://phoible.org/parameters to simply press a "download" button to get that table, but looking at the site now I don't see any download button (cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)

As for why the table in https://github.com/phoible/dev/blob/master/raw-data/FEATURES/phoible-segments-features.tsv has fewer segments than the table at https://phoible.org/parameters --- I'm not sure off the top of my head. The website should reflect the state of this repo as of 862bec9 (the 2.0 release tag) but if I look at that file from the 2.0 release commit it has 2163 lines, not 3183 like on the website. Maybe @bambooforest or @xrotwang have ideas? The files in https://github.com/clld/phoible/tree/master/phoible/static/data have suspicious-looking filenames/dates, making me wonder if the live data is in fact out of date?

xrotwang commented 4 months ago

The process of feeding PHOIBLE 2.0 into the web app wasn't particularly streamlined :) This should be a lot simpler for PHOIBLE 3.0, I'd hope.

So, the data from https://github.com/phoible/dev/ was converted to a CLDF dataset using scripts in https://github.com/bambooforest/phoible-scripts . The process is described in https://github.com/bambooforest/phoible-scripts/blob/master/to_cldf/to_cldf.md and here we already see the 3,183 show up. This CLDF data then served as input to basically copy the CLDF data but add metadata in https://github.com/cldf-datasets/phoible/ - which eventually was loaded into the web app database.

As far as I can tell, the primary data source in the phoible/dev repos is the RData object https://github.com/phoible/dev/blob/master/data/phoible.RData , but @bambooforest might know more about this.

xrotwang commented 4 months ago

(cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)

I did away with the per-table download buttons when I moved to the new paradigm that clld apps only serve data from released CLDF datasets. Thus, rather than download (filtered or sorted or otherwise manipulated) individual collections of rows (without any provenance information), users are encouraged to work from the full CLDF dataset, which includes metadata regarding provenance, etc.

I realize that the PHOIBLE app still advertises the per-table download feature, though. Should be changed (see https://github.com/clld/phoible/issues/32).

xrotwang commented 4 months ago

@tang-kevin Looking at your particular example, maybe phoible-segments-features.tsv isn't supposed to be the full list of segments appearing in any inventories? Just grepping for the segment reveals that it appears elsewhere:

$ grep "ɚ" raw-data/*/*
raw-data/FEATURES/component-feature-table.csv:ɚ,025A,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,+,+,+,+,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
raw-data/UZ/UZ_inventories.tsv:             "ɚː"        "ɚː"    """vowels are lowered and centralised before [ɹ] and many contrasts are lost""" 
raw-data/UZ/UZ_inventories.tsv:             "ɚ"     "ɚ" 
tang-kevin commented 4 months ago

@xrotwang @drammock Thank you for the prompt response.

My use of the phoible-segments-features.tsv is NOT to get the full list of segment of a particular language but to get a comprehensive feature chart that covers all types of sounds. It seems like a valuable chart to have since there aren't anything as comprehensive as this out there.

Does that mean I need to generate this chart from all the individual inventory entries? I would like to avoid this if possible.

On Thu, 28 Mar 2024, 08:02 Robert Forkel, @.***> wrote:

@tang-kevin https://github.com/tang-kevin Looking at your particular example, maybe phoible-segments-features.tsv isn't supposed to be the full list of segments appearing in any inventories? Just grepping for the segment reveals that it appears elsewhere:

$ grep "ɚ" raw-data// raw-data/FEATURES/component-feature-table.csv:ɚ,025A,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,+,+,+,+,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0 raw-data/UZ/UZ_inventories.tsv: "ɚː" "ɚː" """vowels are lowered and centralised before [ɹ] and many contrasts are lost""" raw-data/UZ/UZ_inventories.tsv: "ɚ" "ɚ"

— Reply to this email directly, view it on GitHub https://github.com/phoible/dev/issues/373#issuecomment-2024535665, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWKWXAN7OPSJAGCZZAJKPLY2OXAPAVCNFSM6AAAAABFLBPTQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRUGUZTKNRWGU . You are receiving this because you were mentioned.Message ID: @.***>

xrotwang commented 4 months ago

@tang-kevin As far as I can tell, https://github.com/cldf-datasets/phoible/blob/v2.0.1/cldf/parameters.csv is exactly the complete list of all sounds encountered in any of the inventories covered in PHOIBLE.

tang-kevin commented 4 months ago

@xrotwang Thank you. It does appear to have all 3183 sounds! It solves my personal problem for sure. I would suggest the PHOIBLE website to direct the reader to this file instead of download.