openzim / freecodecamp

FreeCodeCamp.org scraper (to ZIM)
GNU General Public License v3.0
4 stars 2 forks source link

Draft: First pass at content #1

Closed mdp closed 1 year ago

mdp commented 1 year ago

First pass at adding content from FCC to project. This is a pared down version which only has the JS courses (~85mb)

The content root is located at: https://github.com/openzim/freecodecamp/tree/mdp/first_content/content/learn/javascript-algorithms-and-data-structures

kelson42 commented 1 year ago

@mdp Great! I'm not sure if your PR is ready, but not beeing in draft, I assumed "yes" and have put @rgaudin as a reviewer.

In general, put PRs in draft if not ready to review. Otherwise request review from @rgaudin.

kelson42 commented 1 year ago

@mdp No way to retrieve content dynamically from upstream git repository and then apply treatment? Any strong reason to duplicate content in our repository? We don't host any content here usually, worse case in a tarball stored externally.

rgaudin commented 1 year ago

Hi @mdp, thanks for the first PR.

Can you fill me in on what you're trying to achieve? It's not clear to me at this point. I understand the ultimate goal being to have a script/tool to generate a ZIM file for FCC though! 😅

As @kelson42 said, we only host original code (or content) in our repo unless we abvolutely have to. I see that the code ressemble the markdown files at https://github.com/freeCodeCamp/freeCodeCamp/tree/main/curriculum/challenges/english/02-javascript-algorithms-and-data-structures

It would have the additional benefit of being somewhat update-able while this is not.

Let me know what you think and if you have questions.

Few links for the following step:

Also, without having looked at this content, I stumbled upon a cats.json file that references online content. No idea how/if this is gonna be used but of course this would not work offline.

"imageLink": "https://s3.amazonaws.com/freecodecamp/funny-cat.jpg",`
kelson42 commented 1 year ago

@rgaudin I have explicitly recommended to @mdp, not to use zimwriterfs.

mdp commented 1 year ago

Yeah, sorry for the big dump of files. Yes, ideally I'd like to have the following happen in some type of CI system.

Some concerns:

For this first pass, I'm going to create a fcc2zim script to pull the data from a specified folder of content, and package this up into a zim format for testing. Instead of checking the course content into this repo, I'll use a local copy which I can provide for others wanting to test it. I've removed the content from this repository.

rgaudin commented 1 year ago

OK ; we usually don't create ZIM files in CI because it adds too many constraints: resource, readability, etc.

What we do for similar projects would be to download an archive of the content to convert from our shared drive. Then scripts stored in this repo would download it and build the ZIM file.

Then we have two scenarios:

I believe we're talking about the latter ; in this case, a script(s) with proper instructions in the README are probably the way to go. See lilote for instance.

Let me know once you have an archive of the content so I can upload it to the drive.

kelson42 commented 1 year ago

OK ; we usually don't create ZIM files in CI because it adds too many constraints: resource, readability, etc.

We have our own toolchain for that, look at https://farm.openzim.org

Typicall workflow is:

mdp commented 1 year ago

Closing in favor of #5