openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Too many duplicate data in zim #104

Closed rgaudin closed 4 years ago

rgaudin commented 4 years ago

zimcheck outputs thousands of duplicate content article from the latest phzh ZIM.

[WARNING] Redundant data found:
  course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/self-evaluation/05-01-04-intro-evaluation/00-eval-icon.svg (idx 1129) and course/core-english-01/topic-5-my-feelings-and-myself/unit-2-what-makes-me-happy-what-makes-you-sad/self-evaluation/05-02-04-intro-evaluation/00-eval-icon.svg (idx 1170)
  course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/self-evaluation/05-01-04-intro-evaluation/00-eval-icon.svg (idx 1129) and course/core-english-01/topic-5-my-feelings-and-myself/unit-3-expressing-my-worries-and-locking-up-my-fears/self-evaluation/05-03-04-intro-evaluation/00-eval-icon.svg (idx 1206)
  course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/self-evaluation/05-01-04-intro-evaluation/00-eval-icon.svg (idx 1129) and course/core-english-01/topic-7-how-i-live-together-with-others/unit-1-who-can-help/self-evaluation/07-01-04-intro-evaluation/00-eval-icon.svg (idx 1428)
  course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/self-evaluation/05-01-04-intro-evaluation/00-eval-icon.svg (idx 1129) and course/core-english-01/topic-7-how-i-live-together-with-others/unit-2-how-i-can-solve-conflicts/self-evaluation/07-02-04-intro-evaluation/00-eval-icon.svg (idx 1461)
  course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/self-evaluation/05-01-04-intro-evaluation/00-eval-icon.svg (idx 1129) and course/core-english-01/topic-7-how-i-live-together-with-others/unit-3-rules-help-us-to-live-together/self-evaluation/07-03-04-intro-evaluation/00-eval-icon.svg (idx 1504)
satyamtg commented 4 years ago

This is probably because we have content stored in separate folders for different xblocks, and they have shared assets. This can easily be fixed if we, instead of using different places to store assets, store them at a single place, say in instance_assets. What do you suggest?

rgaudin commented 4 years ago

Sounds good