microbit-foundation / python-editor-v3

Micro:bit Educational Foundation Python Editor V3
https://python.microbit.org
MIT License
57 stars 36 forks source link

Old content in Crowdin due to combination of sections? #929

Closed microbit-matt-hillsdon closed 2 years ago

microbit-matt-hillsdon commented 2 years ago

See this thread: https://crowdin.com/translate/microbitorg/6686/en-ja#333360

microbit-matt-hillsdon commented 2 years ago

I'll look at this one on Monday. The sync doesn't delete files so we should carefully review and see if anything needs deleting.

microbit-matt-hillsdon commented 2 years ago

Source ids from the backups:

find . -name '*.json' | grep -v _ | cut -f3 -d/ | cut -f1 -d. | sort -u > source-ids.txt

I'll download the source folder from Crowdin and compare.

microbit-matt-hillsdon commented 2 years ago

The diff just shows intentionally non-translated content (e.g. the Python module - Reference mapping metadata). I'll revisit the Crowdin report.

microbit-matt-hillsdon commented 2 years ago

File is "Decreasing variables". It's still in the CMS but it's unreferenced so it's not part of the toolkit in practice. We need to review for unreferenced toolkit topic entries and similar and delete them. I'll prep a list for Giles to review. Then we'll have the problem I thought we had in the first place, which is the sync won't delete them. But the first step is to remove them from the CMS.

microbit-matt-hillsdon commented 2 years ago
*[ _type == "toolkitTopicEntry" && language == "en"] 
  {_id, "refs": count(*[ references(^._id) ])}
    [ refs == 0 ]
      ._id

This query against the apps Sanity dataset shows 43 documents with no references. These documents cannot be part of the Reference docs as it's a tree of references. Giles and I reviewed a sample of them and he confirmed that they are documents left over from modifications made to the Reference documentation based on past feedback.

There are no unreferenced topics, just topic entries.

I'll work to remove them from the CMS and Crowdin today.

microbit-matt-hillsdon commented 2 years ago

Notes on deletion:

ids-to-delete.txt

The sync task won't run again before tomorrow morning so no need to worry about it interfering.

cat ~/ids-to-delete.txt| while read id; do find . -name "$id"'*'; done | while read n; do rm "$n"; done

🤔 For unclear reasons the delete of the _crowdin documents fails, claiming they are referenced from documents that clearly do not reference them. Not sure we need to resolve this now but it would be good to tidy up.

I think this is due to a bug in the translation system and has come about because documents have been duplicated so they get a new ID but reference the old source id. The impact of this is unclear and needs separate investigation. I've raised an internal issue.

0a0ad2f706e7909480da4fd2848cfc228c36ba98 (private).

Confirmed it's only the odd _crowdin files that are left.

Check:

image

Then think about Crowdin removal.

microbit-matt-hillsdon commented 2 years ago

Worked through deleting these from Crowdin manually with Rob. Document count in Crowdin now matches the CMS at 106 topic entries.

I think we're all done here. I'll check again after the Crowdin sync in the morning