raineorshine / memrise-export

Export all words from a Memrise course to a CSV file
7 stars 3 forks source link

Extract stops at grammar lessons #4

Closed tophersturn closed 3 years ago

tophersturn commented 3 years ago

When an export is preformed, it stops when it reaches a grammar lesson and does not pull the rest of the words Eg https://app.memrise.com/course/2141046/spanish-spain-1/ Stops at lesson 2 (grammar)

raineorshine commented 3 years ago

Thanks for reporting! I'll try to take a closer look at this when I have some free time.

raineorshine commented 3 years ago

Fixed in the codebase; should start working once the Chrome Web Store is updated!

MagTun commented 3 years ago

@raineorshine: The problem with tophersturn's Spanish course is that it contains levels that are multimedias (they don't have a list of words) but are not marked as multimedia on the home page. Thus it's not possible by scrapping the home page to know that we should exclude them (as your script does for the levels marked as multimedia with the help of cheerio). The workaround I suggest (which could probably be improved) is to simply request the .json for each course (not matter if it's a multimedia or a regular one) and if the status is 400 or above, the script simply goes to the next level. To prevent the scrip from scrapping forever, I get the number of levels in the course from the home page and stop the scrapping once the last level is reached. So basically, the fix consists of changing what the script does when res.status > 400 is true and by passing the number_level_in_course to the getWords function. With this fix, cheerio (and thus node.js, ) is not required anymore. I didn't remove the cheerio's parts of the code but I have commented them so that you can see that it does works (for the multimediaLevels variable, I have simply declared it with an empty list). That way, if you want to keep using cheerio (maybe for future improvements), it's still easy for you to revert the changes. And in the mean time, the manual install could be keep easy.

@tophersturn, you have several options to export your course. You can either: 1) wait until raineorshine has the time to validate my fixes (or find a better solution) and for chrome web store to update the extension (which, I think, takes a while) 2) manually install the extension which means you can export memrise now.

If you choose the second option, you can install the extension in a simpler way than what is described in the instructions below "Build from Source" because the node.js package is not required anymore. You just have to do this:

raineorshine commented 3 years ago

Thanks for your contribution!

The workaround I suggest (which could probably be improved) is to simply request the .json for each course (not matter if it's a multimedia or a regular one) and if the status is 400 or above, the script simply goes to the next level. To prevent the scrip from scrapping forever, I get the number of levels in the course from the home page and stop the scrapping once the last level is reached.

Yup, that's exactly what I did. Great minds think alike!

With this fix, cheerio (and thus node.js, ) is not required anymore. I didn't remove the cheerio's parts of the code but I have commented them so that you can see that it does works (for the multimediaLevels variable, I have simply declared it with an empty list). That way, if you want to keep using cheerio (maybe for future improvements), it's still easy for you to revert the changes. And in the mean time, the manual install could be keep easy.

I would love to get the cheerio removal in a separate PR! Then it would be easier for others to install manually.

raineorshine commented 3 years ago

Published in v1.2.0