summerofcode / gsoc-orgs

Google Summer of Code organisation listing
https://soc-orgs.netlify.com/
3 stars 3 forks source link

Categorize Organization #11

Open bekicot opened 6 years ago

bekicot commented 6 years ago

Organization basically have specific category like

But it doesn't explicitly mentioned in the org details. We can actually categorize it by looking into description and match keywords. E.G Healthcare will have medicine, hospital, etc

related to #3

ceefour commented 6 years ago

Google categorizes organizations into 11 categories.

However, some organizations may fall into multiple categories. E.g. Apache, JBoss.

Some organizations have e.g. Android or web UI projects but itself is not focused on that.

For example, PostgreSQL gets into Database but if someone works pgAdmin 4 that will be web UI.

bekicot commented 6 years ago

@ceefour Probably tags would be more suitable name instead of category for that reason.

bekicot commented 6 years ago

The main point is, to provide a way for user to see what organization fall into specified keyword

e.g Education will have processing, which is the organization also has project for their javascript Library, which is p5.js but it is not about education. It is a drawing library.

ceefour commented 6 years ago

👍

ceefour commented 6 years ago

Next step would be tagging projects in addition to organizations. For 200+ organizations this may be a lot of effort. But for 25 GCI orgs probably still doable.

bekicot commented 6 years ago

We can actually automate this.

  1. Specify the keywords
  2. Using this data, search the specified keywords in description
  3. if it is matches, then tag them with the tags, associated with the keywords.
bekicot commented 6 years ago

It is failry easy, the manual part would be specifying the keywords. e.g Healthcare keyword: hospital, decease, Healthcare, medicine, ebola

ceefour commented 6 years ago

You can assign someone to specify the keywords, i.e. via CSV, JSON, or YAML file.

jayvdb commented 6 years ago

It would be good to enhance the plugin to map the JSON category field to frontmatter categories or tags field. See https://github.com/avillafiorita/jekyll-datapage_gen/issues/46#issuecomment-371082051

There should be no manual part. There is copious amounts of Open Data about most of these orgs in order to automate categorisation, mining text if necessary. The smaller orgs should build Open Data about themselves rather than adding metadata into this single-purpose repo.

jayvdb commented 6 years ago

https://github.com/pattex/jekyll-tagging might be an interesting addition if we can generate lots of tag data.