prmr / Creco

Recommendation System for Consumer Products
Apache License 2.0
6 stars 2 forks source link

Detector for product dead links #83

Closed prmr closed 10 years ago

prmr commented 10 years ago

Automatically detect invalid product pages at load time and convert them to an empty string in the product objects.

nishanthtgwda commented 10 years ago

@enewe101 : Managed to get the json for product_id's and their states. Crunching at about 10,000 links per hour.

enewe101 commented 10 years ago

Sweet! I'll adapt JsonLoaderService to look at that .json file when it builds products. I'll do it on this branch.

I was thinking about where the .json file for the url status codes should live. Why don't we actually commit it to the project? I think this should be ok, since the status code returned for a web page is it's public info. I'm not sure where the right place to put it is though. Maybe under src/main/resources, or just in the root?

nishanthtgwda commented 10 years ago

I think storing it in the local data path would be good. Since the whole of Category tree / Products are built from external json and this info of the return state also comes from an external json, all of it residing in one place would be easy to group things together. If not root or any place should just be fine.

nishanthtgwda commented 10 years ago

Mailed the code + files.

enewe101 commented 10 years ago

Ok I've encorporated a function in JsonLoaderService that reads your file and, while building products, checks if the link is dead. Right now that's pushed in branch Issue83. Appears to be working.

I'm going to write tests, then push to master.

enewe101 commented 10 years ago

Ok merging to master...

enewe101 commented 10 years ago

Ah, wait, no one else has the dead_links.json file. So this would break master for everyone else!

Ok, I'll post that file in the class forum, and I'll add a condition to the code -- if it can't find the dead_links.json file, then it will skip the dead links check.

enewe101 commented 10 years ago

Ok, all good. I am pushing to master.

This will not break master, however, you will have one failing test until you download the dead_links.json file, which I have posted under "Technical Stuff" in the class forum.

asutcl commented 10 years ago

Sorry quick question. Can we just create an empty dead_links.json and then run it and the algorithm will fill it up or do we need the formatted file?

enewe101 commented 10 years ago

You need the file. Filling it up is a several-hour process, so is completely separate from running the application. Nishanth ran it once and for all.

On Sat, Mar 29, 2014 at 6:01 PM, asutcl notifications@github.com wrote:

Sorry quick question. Can we just create an empty dead_links.json and then run it and the algorithm will fill it up or do we need the formatted file?

Reply to this email directly or view it on GitHubhttps://github.com/prmr/Creco/issues/83#issuecomment-39010524 .

enewe101 commented 10 years ago

That being said, I guess you could run Nishanth's code to fill it yourself if you really wanted.

Come to think of it, where is that code @nishanth1991? I can't find it.

asutcl commented 10 years ago

Ok, thanks I didn't realise it took so long. I will just get the file from the discussion board. Do we put it under src/main/ressources or under root?

nishanthtgwda commented 10 years ago

@enewe101 : Oh I just pushed it now. You can find it in ca.mcgill.cs.creco.data. The file name is CRDeadlinks. When you run it make sure you rename the method from main1 to main.

enewe101 commented 10 years ago

Please see my post on the class forum in Technical Stuff. On Mar 29, 2014 6:28 PM, "nishanth1991" notifications@github.com wrote:

@enewe101 https://github.com/enewe101 : Oh I just pushed it now. You can find it in ca.mcgill.cs.creco.data. The file name is CRDeadlinks. When you run it make sure you rename the method from main1 to main.

Reply to this email directly or view it on GitHubhttps://github.com/prmr/Creco/issues/83#issuecomment-39011226 .

prmr commented 10 years ago

Please clean the code and remove dead code such as main1 (this should be done before merging into master).

nishanthtgwda commented 10 years ago

Cleaned the code in CRDeadlinks. No dead codes in CRDeadlinks.java

asutcl commented 10 years ago

The JUnit test fails for this for some reason. DeadLinks don't seem to be empty strings

I guess this is not a detector error. But I didn't see any more relevant issues.

asutcl commented 10 years ago

NVM i dont think the file was in the right place. Sorry.