Closed sliceofcake closed 8 years ago
I believe that that case was halted because it ran into a gif. I've now put code to explicitly dodge gifs. There was also an issue with iframes not being released as well as they could be. Now, whenever the script advances a page, ~all~ outstanding iframes are removed.
The JavaScript script now takes about 5 minutes to complete.
The JavaScript scipt will allow page load bursts, but only per page now. So an entire page can try to scan at once, but there are gates between each page [and each artist].
I'm brushing aside the possibility of an entirely command-line algorithm for now because the logic is fairly complex and I neither know the extent of command-line tools nor, obviously therefore, how to use them if they exist.
The resulting file is approximately 6.4MB.
• I ran the JavaScript script on a list of 46 artists and it didn't complete. I have here 2 hours and 45 minutes, after which I stopped the timer because my browser stopped making progress [see the note about RAM usage]. • Firefox's pegged my RAM over 10.5GB+ before it ran out of free RAM and seemingly stopped functioning. • It seemed to frequently have many more than 4 outstanding pages, think like 17 in some bursts.
I'd like the JavaScript script to run much faster. Think 10x faster. Maybe look into sending cookies along with cURL requests and do this scanning process on the command-line, if it's possible to transfer the scanning logic over.
--or, in the meantime, maybe there's an issue with iframe not properly being released? Unless the text file is somehow becoming erroneously enormous, Firefox should not be demanding more the 10.5GB+ of RAM that it was.