Open satyamtg opened 4 years ago
Agrees. Thanks for your experiments with multiprocessing.
This is very similar to other scrapers in that we have concurrent usages:
It's a lot of requirements that calls for flexibility. Also, we definitely want to assess our S3 performance before getting into this as we need to know where are the bottlenecks and which methods delivers best for those download/upload use cases.
This all renders this quite complex which is why I think we shall attempt to solve it on a less fragile scraper (youtube?) first and document/replicate onto others.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
We can eaisly support multithreading here by having multiple threads for for the download method of the xblock_extractor objects. However, we do have videos from youtube_dl which need to be in a separate queue (as that's throttled). So, I think we need to handle that in a good way here as multithreading drastically improves performance of this very scraper. Maybe we can have a main multithreaded process (because it has many HTTP requests) and handle youtube separately.