Open gvcormac opened 9 years ago
OK, now I'm totally stuck. All the docs are "server error" and I have no way to refresh the list.
I've sent out an doc list email for illicit goods, I just sent it again. And i also checked manually, these docs are indeed not indexed at our side.
refer to the http://infosense.cs.georgetown.edu/resource/bhw_list
You are missing 60,523 documents, some from bhw, some from hackforums. Here is a list of what you are missing:
http://plg.uwaterloo.ca/~gvcormac/missing_list.txt
$wc missing_list.txt 60523 60523 4590749 missing_list.txt
As a workaround, I will temporarily remove these from the Waterloo end.
May I know how your side rendered the illicit goods dataset?
Because we only indexed the docs containing "features" field, which are considered to be a thread and having real content. (referring to my email)
So for those "missing documents", even if we add them, there would be no content inside.
I have no knowledge of this. We rendered every document. I don't know what you mean by "no content." The document listed above contains the following.
[ . . . ]
Default Re: ALL-IN-ONE MANUAL SUBMISSION SERVICES - Cheap and Best Service
I am interested in this package of social bookmarking 350 PR9-PR0 -
$35. If you can please be online on YIM that will be great. Thank
you.
[58]Reply With Quote Reply With Quote
______________________________________________________________
2.
3. 07-17-2009, 12:02 PM [59]#137
[60]Xnode's Avatar
[61]Xnode
Xnode is offline Newbies
[62]Send a message via Skype(TM) to Xnode
Join Date
Jun 2009
Location
United States
Posts
21
Thanks
20
Thanked 6 Times in 6 Posts
Default Re: ALL-IN-ONE MANUAL SUBMISSION SERVICES - Cheap and Best Service
I recently had my order completed and have nothing but good things
to say. the work was done very quickly and professionally. I look
forward to having more work done in the future. I would highly
recommend the services to anyone. Thanks again crmarjunkarthik!
[63]Reply With Quote Reply With Quote
______________________________________________________________
4.
The Following User Says Thank You to Xnode For This Useful Post:
[64]crmarjunkarthik (07-19-2009)
I think you need a more robust response than "server error" for documents you don't expect (and similar "file not found" errors). When you get "server error" you lose all controls. Maybe this will be mitigated when the controls are separate and when there is a manual refresh, but I still think that some sharp edges need to be removed.
I'll go check the documents and also take care of your suggestion above, currently just keep the those "missing" docs removed.
I agree that there are more important priorities than including these documents. They are currently excluded.
Yes. One thing i want to bring up: in the CCA schema, there are 'key', 'url', 'request', 'response' etc. fields.
For other datasets, what i think we should regard as content to show the assessor is item['response']['body'], for example, in the ebola dataset from NYU, the crawled html page source code is inside item['response']['body']
But for illicit goods, they added a 'features' field along with 'key', 'url', 'request',' response' They merged all posts under a certain thread, and the posts' contents are included in "features" field.
We are using response.body for illicit goods.
My "browse" list has the following document. When I click on it I get "server error."
com_blackhatworld_www_6dd9a64484474e6b3ed9d77e23f90ce96ae0b311_1422722123679