tig12 / g5

Merge Gauquelin and related data in a postgres database
Other
8 stars 2 forks source link

CSID & ERID missing in ogdb-[all|time].csv ? #3

Open smallufo opened 2 years ago

smallufo commented 2 years ago

First , I want to say , this is a fantastic work !!! I noticed gauquelin's data long time ago and did some parsing, but never merge them with Müller,Ertel's source. I hope I can do some contribution (but I am from Java/Kotlin world).

Something I wonder, I looked into the dump ogdb-time.csv and ogdb-all.csv , I can only see GQID and MUID , but I didn't find CSID and ERID column.

Are these 2 columns missing ?

Thanks.

tig12 commented 2 years ago

Yes, that's because CSID and ERID concern only sportspersons. These columns could be added to the files containing sportspersons, along with columns CFID (CFEPP) and CPID (Comité Para). I add this on the todo list.

tig12 commented 2 years ago

I hope I can do some contribution (but I am from Java/Kotlin world).

There's plenty of space for contribution, either programming or building a reference documentation about historical data or imagining solutions for questions posed by this program.

Current state of the program is at the end of its first step, the gathering of historical data in a single database. Besides g5 (php), which builds the database, there is https://github.com/tig12/openg ( deployed on https://opengauquelin.org ), a frontend to present, browse and download data. It's written in go, html, js, css and uses postgrest to query the database. openg is quite immature, I'd like to reach a convenient user interface including a precise documentation of the historical groups.

Maybe you can also help to prepare the other steps: check historical data, build larger groups and build eminence rankings.

So you see, if you want to contribute, there is choice, it depends on which aspects of the work interest you. Can I ask you why are you interested in Gauquelin 'n' co ?

smallufo commented 2 years ago

Hi . I think I can batch query person's name against wikiData. The query I was using is as follows :

SELECT distinct ?item ?itemLabel ?itemDescription ?gender ?coord ?placeLabel
  (SAMPLE(?birthday) as ?birthday) 
  (SAMPLE(?place) as ?place)
  (SAMPLE(?RIP) as ?RIP) 
  (SAMPLE(?image)as ?image) 
  (SAMPLE(?article)as ?article) 
WHERE {
  ?item wdt:P31 wd:Q5.
  ?item ?label "Barack Obama"@en.  
  ?article schema:about ?item .
  ?article schema:inLanguage "en" .
  ?article schema:isPartOf <https://en.wikipedia.org/>.
  ?item wdt:P21 ?gender                # P32 : gender               
  OPTIONAL{?item wdt:P569 ?birthday .} # P569 : Date of birth
  OPTIONAL{?item wdt:P19 ?place .}     # P19 : Place
  OPTIONAL{?place wdt:P625 ?coord .}   # P625 : Coordinate
  OPTIONAL{?item wdt:P570 ?RIP .}      # P570 : Date of death
  OPTIONAL{?item wdt:P18 ?image .}     # P18 : image  

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }    
}
GROUP BY ?item ?itemLabel ?itemDescription ?gender ?coord ?placeLabel

SPARQL seems very difficult , I copied it from somewhere else.

But I am not sure why it cannot query John Scanes while he has a wiki item ( timeout link : https://bit.ly/3PyWSZ1 )

If you have a SPARQL script , that lists persons (and their data) by name , I may write some code to convert and link all names to possible wiki item (Qxxx)

smallufo commented 2 years ago

Can I ask you why are you interested in Gauquelin 'n' co ?

I think I can do some data mining from this data :smiley:
In 2008 , I did some mining with WEKA library : https://destiny.to/app/mining ( Chinese content ) , https://destiny.to/app/mining/en ( English ) But that was a premature work , I want to enhance that work.