Closed microbit-matt-hillsdon closed 2 years ago
I'll look at this one on Monday. The sync doesn't delete files so we should carefully review and see if anything needs deleting.
Source ids from the backups:
find . -name '*.json' | grep -v _ | cut -f3 -d/ | cut -f1 -d. | sort -u > source-ids.txt
I'll download the source folder from Crowdin and compare.
The diff just shows intentionally non-translated content (e.g. the Python module - Reference mapping metadata). I'll revisit the Crowdin report.
File is "Decreasing variables". It's still in the CMS but it's unreferenced so it's not part of the toolkit in practice. We need to review for unreferenced toolkit topic entries and similar and delete them. I'll prep a list for Giles to review. Then we'll have the problem I thought we had in the first place, which is the sync won't delete them. But the first step is to remove them from the CMS.
*[ _type == "toolkitTopicEntry" && language == "en"]
{_id, "refs": count(*[ references(^._id) ])}
[ refs == 0 ]
._id
This query against the apps Sanity dataset shows 43 documents with no references. These documents cannot be part of the Reference docs as it's a tree of references. Giles and I reviewed a sample of them and he confirmed that they are documents left over from modifications made to the Reference documentation based on past feedback.
There are no unreferenced topics, just topic entries.
I'll work to remove them from the CMS and Crowdin today.
Notes on deletion:
The sync task won't run again before tomorrow morning so no need to worry about it interfering.
[x] Take a CMS backup first.
[x] Check for no references to the IDs in English documents:
cat ~/ids-to-delete.txt| while read id; do grep -R "$id" documents; done | egrep -v '_ja|_zh|_es|_fr|_ko|_crowdin'
Note that we also exclude the crowdinSourceDetails
document.
351 documents not 352 because one is missing its crowdinSourceDetails document (presumably just down to the time of the change). 352 expected because 44 * (en + 6 langs + source details).
[x] Find and delete all variants of the documents, including translations and crowdinSourceDetails:
cat ~/ids-to-delete.txt| while read id; do find . -name "$id"'*'; done | while read n; do rm "$n"; done
🤔 For unclear reasons the delete of the _crowdin documents fails, claiming they are referenced from documents that clearly do not reference them. Not sure we need to resolve this now but it would be good to tidy up.
I think this is due to a bug in the translation system and has come about because documents have been duplicated so they get a new ID but reference the old source id. The impact of this is unclear and needs separate investigation. I've raised an internal issue.
0a0ad2f706e7909480da4fd2848cfc228c36ba98 (private).
Confirmed it's only the odd _crowdin files that are left.
Check:
Then think about Crowdin removal.
Worked through deleting these from Crowdin manually with Rob. Document count in Crowdin now matches the CMS at 106 topic entries.
I think we're all done here. I'll check again after the Crowdin sync in the morning
See this thread: https://crowdin.com/translate/microbitorg/6686/en-ja#333360