Closed prmr closed 10 years ago
@enewe101 : Managed to get the json for product_id's and their states. Crunching at about 10,000 links per hour.
Sweet! I'll adapt JsonLoaderService to look at that .json file when it builds products. I'll do it on this branch.
I was thinking about where the .json file for the url status codes should live. Why don't we actually commit it to the project? I think this should be ok, since the status code returned for a web page is it's public info. I'm not sure where the right place to put it is though. Maybe under src/main/resources
, or just in the root?
I think storing it in the local data path would be good. Since the whole of Category tree / Products are built from external json and this info of the return state also comes from an external json, all of it residing in one place would be easy to group things together. If not root or any place should just be fine.
Mailed the code + files.
Ok I've encorporated a function in JsonLoaderService that reads your file and, while building products, checks if the link is dead. Right now that's pushed in branch Issue83. Appears to be working.
I'm going to write tests, then push to master.
Ok merging to master...
Ah, wait, no one else has the dead_links.json file. So this would break master for everyone else!
Ok, I'll post that file in the class forum, and I'll add a condition to the code -- if it can't find the dead_links.json
file, then it will skip the dead links check.
Ok, all good. I am pushing to master.
This will not break master, however, you will have one failing test until you download the dead_links.json
file, which I have posted under "Technical Stuff" in the class forum.
Sorry quick question. Can we just create an empty dead_links.json and then run it and the algorithm will fill it up or do we need the formatted file?
You need the file. Filling it up is a several-hour process, so is completely separate from running the application. Nishanth ran it once and for all.
On Sat, Mar 29, 2014 at 6:01 PM, asutcl notifications@github.com wrote:
Sorry quick question. Can we just create an empty dead_links.json and then run it and the algorithm will fill it up or do we need the formatted file?
Reply to this email directly or view it on GitHubhttps://github.com/prmr/Creco/issues/83#issuecomment-39010524 .
That being said, I guess you could run Nishanth's code to fill it yourself if you really wanted.
Come to think of it, where is that code @nishanth1991? I can't find it.
Ok, thanks I didn't realise it took so long. I will just get the file from the discussion board. Do we put it under src/main/ressources
or under root
?
@enewe101 : Oh I just pushed it now. You can find it in ca.mcgill.cs.creco.data. The file name is CRDeadlinks. When you run it make sure you rename the method from main1 to main.
Please see my post on the class forum in Technical Stuff. On Mar 29, 2014 6:28 PM, "nishanth1991" notifications@github.com wrote:
@enewe101 https://github.com/enewe101 : Oh I just pushed it now. You can find it in ca.mcgill.cs.creco.data. The file name is CRDeadlinks. When you run it make sure you rename the method from main1 to main.
Reply to this email directly or view it on GitHubhttps://github.com/prmr/Creco/issues/83#issuecomment-39011226 .
Please clean the code and remove dead code such as main1
(this should be done before merging into master).
Cleaned the code in CRDeadlinks. No dead codes in CRDeadlinks.java
The JUnit test fails for this for some reason. DeadLinks don't seem to be empty strings
I guess this is not a detector error. But I didn't see any more relevant issues.
NVM i dont think the file was in the right place. Sorry.
Automatically detect invalid product pages at load time and convert them to an empty string in the product objects.