Open davelab6 opened 6 years ago
I'm on this right now. I'm thinking that it's probably best to use:
GoogleFontsAPI/production
source: the family
key (== family name of the gf-api family data) which is the status quo.GitHub-GoogleFonts/pulls
and GitHub-GoogleFonts/master
sources: the name
key of the METADATA.pb FamilyProto
message (we currently try to figure this via the font file names)CSVSpreadsheet/upstream
source the value of the family
cell. (tried to use the font file names, but that's really messy here sometimes, especially because we'll need to go over the CSV file and the check-reports it produces and fix e.g. some of the fontfiles prefix
entries etc.).It seems to me using these places for the family name is the best compromise between control, usability and readability. Still, when a family name is wrong we'll get wrong rows and we'll have to fix that in the sources and in the database. But, eventually that shouldn't be so much cleanup work anymore, once everything is set up.
Ok, the CSVSpreadsheet/upstream
source now creates it's family_name
using the family row of the spreadsheet. To fix the already created db-entries into what that source would create now I ran the following query in the rethinkdb admin interface (putting it here as documentation, as this was just ad-hoc).
// you run $ kubectl proxy
// then go to: http://localhost:8001/api/v1/namespaces/default/services/rethinkdb-admin/proxy/#dataexplorer
// and run:
r.db('fontbakery')
.table('collectiontests')
.getAll('CSVSpreadsheet/upstream',{index: 'collection_id'})
.filter(row=>{return row('family_name').ne(row('metadata')('sourceDetails')('name'));})
.update({family_name: r.row('metadata')('sourceDetails')('name')})
// resulted in:
{
"deleted": 0 ,
"errors": 0 ,
"inserted": 0 ,
"replaced": 105 ,
"skipped": 0 ,
"unchanged": 0
}
The git based sources follow next.
I'm just going through the dashboard rows to find duplicate rows from the Git bases sources, to rename them into the names that are in the METADATA.pb
files. @davelab6 there are some inconsistencies (not only Git sources, mostly the Spreadsheet/CSV-file upstream source):
The API uses BioRhyme
and BioRhyme Expanded
but the METADATA.pb
files use Bio Rhyme
and Bio Rhyme Expanded
. Are these bugs in the METADATA.pb
files?
The GitHub master METADATA.pb uses Ek Mukta
but the Spreadsheet/CSV uses Ek-Mukta
(also Ek-Mukta Mukta Devanagari
, Ek-Mukta MuktaMalar Tamil
, Ek-Mukta MuktaVaani Gujarati
)
My guess is this should be fixed in the CSV-file to remove the hyphen from the name.
The Spreadsheet/CSV-file should remove the hyphen from Encode Sans Semi-Condensed
and Encode Sans Semi-Expanded
The Spreadsheet/CSV-file should add spaces to: FiraCode
, FiraSansCondensedHairline
, FiraSansExtraCondensedHairline
, FiraSansHairline
, FiraSansUltra
.
The Spreadsheet/CSV-file should (probably) rename "Pangolin Sans" into "Pangolin"
The Spreadsheet/CSV-file should rename PostNoBills Colombo
and PostNoBills Jaffna
to include spaces.
The Spreadsheet/CSV-file defines 3 rows: Slabo 13px
, Slabo 27px
and Slabo
Slabo
row should be removedSlabo 13px
row should have a "fontfiles prefix" of TTFs/Slabo13px
Slabo 27px
row should have a "fontfiles prefix" of TTFs/Slabo27px
The Spreadsheet/CSV-file should rename Varela Round Hebrew
into Varela Round
.
Registers as alpha 3c
and has neither a good file name nor a METADATA.pb
. The filename is "Jomolhari-alpha3c-0605331.ttf" and the regex we usually use extracts the alpha3c
part. This family is not in production.
There was a PR that put Computer Modern into a wrong slot because of badly chosen file names (improved in commit 1d6f3520f9a256703e6bb831b1d832c0e49cdac4) and missing METADATA.pb
https://github.com/google/fonts/pull/1129 ofl/computermodern/cmunbbx.ttf
While this is not really an issue, this is a thing that can always happen to the dashboard, because everyone can issue a PR. Just mentioning. The newer revision of the PR, I'm not sure if the naming problems are resolved yet, I'm not going to change the cmunbbx
now, but eventually we'll want to get rid of it I think.
There was a commit Remove cwTeX fonts (from master). We still have these rows:
cw Te X Fang Song
cw Te X Hei
cw Te X Kai
cw Te X Ming
cw Te X Yen
Should I delete them?
Here's the rethink db query that updated existing rows to match what the sources will do in the future based on using METADATA.pb
when present. The list is a good overview of where our CamelCase to names-with-spaces rules break :-D
// you run $ kubectl proxy
// then go to: http://localhost:8001/api/v1/namespaces/default/services/rethinkdb-admin/proxy/#dataexplorer
// and run:
var rename = r.expr({
"A Bee Zee": "ABeeZee"
, "Bench Nine": "BenchNine"
, "Dawningofa New Day": "Dawning of a New Day"
, "Frederickathe Great" :"Fredericka the Great"
, "Gen Bas B": "Gentium Basic"
, "Gen Bk Bas B": "Gentium Book Basic"
, "IM Fe D Pit 28 P": "IM Fell Double Pica"
, "IM Fe D Psc 28 P": "IM Fell Double Pica SC"
, "IM Fe E Nit 28 P": "IM Fell English"
, "IM Fe E Nsc 28 P": "IM Fell English SC"
, "IM Fe F Cit 28 P": "IM Fell French Canon"
, "IM Fe F Csc 28 P": "IM Fell French Canon SC"
, "IM Fe G Pit 28 P": "IM Fell Great Primer"
, "IM Fe G Psc 28 P": "IM Fell Great Primer SC"
, "IM Fe P Iit 28 P": "IM Fell DW Pica"
, "IM Fe P Isc 28 P": "IM Fell DW Pica SC"
, "Josefin Sans Std": "Josefin Sans Std Light"
, "Lateef Reg OT": "Lateef"
, "Lovedbythe King": "Loved by the King"
, "Mc Laren": "McLaren"
, "Medieval Sharp": "MedievalSharp"
, "Mountainsof Christmas": "Mountains of Christmas"
, "OFL Goudy St MTT": "OFL Sorts Mill Goudy TT"
, "Old Standard": "Old Standard TT"
, "PTM55 FT": "PT Mono"
, "Press Start 2 P": "Press Start 2P"
, "Swankyand Moo Moo": "Swanky and Moo Moo"
, "Unifraktur Cook": "UnifrakturCook"
, "Unifraktur Maguntia": "UnifrakturMaguntia"
, "Waitingforthe Sunrise": "Waiting for the Sunrise"
, "Web": "PT Serif Caption"
, "js Math": "jsMath cmbx10"
});
r.db('fontbakery')
.table('collectiontests')
.filter(function(row) {
return rename.keys().contains(row('family_name'));
})
.update(function(row){
return {"family_name": rename(row('family_name'))}
});
// resulted in:
{
"deleted": 0 ,
"errors": 0 ,
"inserted": 0 ,
"replaced": 221 ,
"skipped": 0 ,
"unchanged": 0
}
I would expect these maintenance tasks to appear more often in the future, so I'm not sure if we should close this issue or keep it open. The queries posted in here are useful to have around though.
Also, the things mentioned in https://github.com/googlefonts/fontbakery-dashboard/issues/73#issuecomment-401902735 will need more changes to already existing rows once they are resolved. Further, some "early access" fonts don't have METADATA.pb
files, and I expect they are likely to change their family names, or at least how they register in the dashboard.
I went over the current http://35.225.170.228/dashboard and found these rows in which the same family appears as 2 rows: