python-organizers / conferences

List of Python Conferences around the World
186 stars 90 forks source link

RFC: Remove country name from Location field #224

Open jonafato opened 2 months ago

jonafato commented 2 months ago

I'd like two remove the country name from the Location field. This field is both redundant with the more-sepcific Country field and inconsistently used (e.g. sometimes not used, sometimes with spelling variations like "USA" vs. "United States of America". I'm opening this issue first to:

  1. get feedback in case there are cases where the country information cannot be expressed with the three-letter country code
  2. identify any tooling that uses this repository that would not handle this change gracefully or would need additional updates in order to do so
  3. see if this would actually be a net-negative change, e.g. because ISO 3166-1 alpha-3 country codes are machine-readable but are not necessarily obvious to people reading the CSV files directly
invisibleroads commented 2 months ago

Hmm this seems like a good idea because it would improve data consistency but I will wait to see if anyone else voices their opinion.

On Wed, Jun 12, 2024, 12:08 PM Jon Banafato @.***> wrote:

I'd like two remove the country name from the Location field. This field is both redundant with the more-sepcific Country field and inconsistently used (e.g. sometimes not used, sometimes with spelling variations like "USA" vs. "United States of America". I'm opening this issue first to:

  1. get feedback in case there are cases where the country information cannot be expressed with the three-letter country code
  2. identify any tooling that uses this repository that would not handle this change gracefully or would need additional updates in order to do so

— Reply to this email directly, view it on GitHub https://github.com/python-organizers/conferences/issues/224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACBDLCHJ2JTZ62Z2Z7BUYDZHBW6TAVCNFSM6AAAAABJGVTSHWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DSMJVGM2TAOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

invisibleroads commented 2 months ago

I think my rationale in the past for being more specific about location in certain cases was to reduce ambiguity.

I agree removing the country from the location might be a good idea since we already have the three digit code.

I just want to make the point that the location should be specific enough to remove ambiguity, for example, if a country has two cities with the same name like Lexington, Kentucky vs Lexington, Massachusetts.

jonafato commented 2 months ago

I just want to make the point that the location should be specific enough to remove ambiguity, for example, if a country has two cities with the same name like Lexington, Kentucky vs Lexington, Massachusetts.

I agree with this, and I'm not suggesting that we remove state / province / etc. kind of details, just the country information that's already stored in a dedicated field.

invisibleroads commented 2 months ago

Your third point is valid. Some of the three digit codes are not immediately clear. It would take away from the readability of the CSV especially if some people refer directly to the github repository and not a third party calendar or website

JesperDramsch commented 2 months ago

As a downstream user of the CSVs it would be a minor inconvenience as I have to adjust my scripts, but since I already have to do a bunch of data cleaning anyways, it would just be adjusting my scripts.

I agree with most points, but to add something of substance:

On the positive side, this would also circumvent "data problems" around the self-determination of countries, such as Turkiye asking not to be called Turkey and Czechia asking to rather not be called the Czech Republic.

On the negative side, PyCon DE, with the 3-letter code DEU, would be thoroughly confusing for most people who don't already know.

So, I have to say, as long as the data is consistent across the data set, it'd probably be okay. But if it changes halfway through the 2024-file, I'd probably struggle slightly downstream.

jonafato commented 2 months ago

On the positive side, this would also circumvent "data problems" around the self-determination of countries, such as Turkiye asking not to be called Turkey and Czechia asking to rather not be called the Czech Republic.

This would be another benefit of the benefits of this change. A repository covering a set of global conferences is already going to encounter language and translation issues, so this would remove one point of confusion.

On the negative side, PyCon DE, with the 3-letter code DEU, would be thoroughly confusing for most people who don't already know.

This is mostly a question of how end-users are consuming this data. Automated tooling perform lookups (e.g. we use https://pypi.org/project/iso3166/ for some CI here), and I would imaging most conference participants are either familiar with their local events or fine with clicking through to the conference website. This is good feedback, though, and the reason that I'm opening this issue up for discussion.

So, I have to say, as long as the data is consistent across the data set, it'd probably be okay. But if it changes halfway through the 2024-file, I'd probably struggle slightly downstream.

Any change implemented here would be a global update in a single commit. As long as tools are able to deal with a new version of the data set, they shouldn't need to worry about supporting mixed formats.