CSID & ERID missing in ogdb-[all|time].csv ?

smallufo commented 2 years ago

First , I want to say , this is a fantastic work !!! I noticed gauquelin's data long time ago and did some parsing, but never merge them with Müller,Ertel's source. I hope I can do some contribution (but I am from Java/Kotlin world).

Something I wonder, I looked into the dump ogdb-time.csv and ogdb-all.csv , I can only see GQID and MUID , but I didn't find CSID and ERID column.

Are these 2 columns missing ?

Thanks.

tig12 commented 2 years ago

Yes, that's because CSID and ERID concern only sportspersons. These columns could be added to the files containing sportspersons, along with columns CFID (CFEPP) and CPID (Comité Para). I add this on the todo list.

tig12 commented 2 years ago

I hope I can do some contribution (but I am from Java/Kotlin world).

There's plenty of space for contribution, either programming or building a reference documentation about historical data or imagining solutions for questions posed by this program.

Current state of the program is at the end of its first step, the gathering of historical data in a single database. Besides g5 (php), which builds the database, there is https://github.com/tig12/openg ( deployed on https://opengauquelin.org ), a frontend to present, browse and download data. It's written in go, html, js, css and uses postgrest to query the database. openg is quite immature, I'd like to reach a convenient user interface including a precise documentation of the historical groups.

Maybe you can also help to prepare the other steps: check historical data, build larger groups and build eminence rankings.

Check the data and make them verifiable. It means managing groups of contributors who visit the online civil registries and either take a screenshot of the original act or provide a link to easily consult the civil registry. For this we need to add wiki functionality to the database. And I don't know which direction to follow: write a wiki from scratch, use an existing tool like MediaWiki and write code to transfer the information in g5 database, or use a tool based on git ? If you have an opinion, I'm interested.
Building larger groups involves programming, to feed g5 database from other sources, in particular wikidata.org (see https://tig12.github.io/gauquelin5/wikidata.html ). Any kind of list of persons is potentially interesting. I've prepared a few lists, like https://github.com/tig12/g5-aux/tree/main/eminence/math or https://github.com/tig12/mactutor-by-links or unpublished code to aspire sites: http://www.academie-medecine.fr/ , https://www.academie-sciences.fr , http://cths.fr This is done by small autonomous tools, I've written them in php, but they could be written in java or any other language. Some of these tools permit at the same time to expand the historical groups and build eminence indicators.

So you see, if you want to contribute, there is choice, it depends on which aspects of the work interest you. Can I ask you why are you interested in Gauquelin 'n' co ?

smallufo commented 2 years ago

Hi . I think I can batch query person's name against wikiData. The query I was using is as follows :

SELECT distinct ?item ?itemLabel ?itemDescription ?gender ?coord ?placeLabel
  (SAMPLE(?birthday) as ?birthday) 
  (SAMPLE(?place) as ?place)
  (SAMPLE(?RIP) as ?RIP) 
  (SAMPLE(?image)as ?image) 
  (SAMPLE(?article)as ?article) 
WHERE {
  ?item wdt:P31 wd:Q5.
  ?item ?label "Barack Obama"@en.  
  ?article schema:about ?item .
  ?article schema:inLanguage "en" .
  ?article schema:isPartOf <https://en.wikipedia.org/>.
  ?item wdt:P21 ?gender                # P32 : gender               
  OPTIONAL{?item wdt:P569 ?birthday .} # P569 : Date of birth
  OPTIONAL{?item wdt:P19 ?place .}     # P19 : Place
  OPTIONAL{?place wdt:P625 ?coord .}   # P625 : Coordinate
  OPTIONAL{?item wdt:P570 ?RIP .}      # P570 : Date of death
  OPTIONAL{?item wdt:P18 ?image .}     # P18 : image  

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }    
}
GROUP BY ?item ?itemLabel ?itemDescription ?gender ?coord ?placeLabel

SPARQL seems very difficult , I copied it from somewhere else.

But I am not sure why it cannot query John Scanes while he has a wiki item ( timeout link : https://bit.ly/3PyWSZ1 )

If you have a SPARQL script , that lists persons (and their data) by name , I may write some code to convert and link all names to possible wiki item (Qxxx)

smallufo commented 2 years ago

Can I ask you why are you interested in Gauquelin 'n' co ?

I think I can do some data mining from this data :smiley:
In 2008 , I did some mining with WEKA library : https://destiny.to/app/mining ( Chinese content ) , https://destiny.to/app/mining/en ( English ) But that was a premature work , I want to enhance that work.

tig12 / g5

CSID & ERID missing in ogdb-[all|time].csv ? #3